Designing a Comment System
23 May 2019In my previous post
I mentioned one possible solution for adding comments to my blog using the
built in support for data files in Jekyll. This approach was pioneered by
Damien Guard.
In this post I hope to have a crack at designing such a system myself and
implementing it.
What do I want?
My first step in designing this comment system will be to decide what my goals are.
- Foremost I want to allow people to leave comments on my blog (obviously)
- Adding comments should be relatively easy
- The format they are in should be common, I am leaning toward Markdown as that is the common format used by Jekyll and is commonly used on sites such as Reddit
- I want to have to curate all the comments coming in and approve them if they
seem legitimate
- I’ve had experience in the past when running Wordpress blogs where there was a lot of spam or irrelevant comments that would be nice to filter out
- Manual approval would likely be fine for me since my blog is low traffic
- I could still augment approval so that obvious spam comments can be filtered out automatically
- Being able to block certain known offenders would be a nice feature as well; obviously this is non-trivial but a simple IP blacklist could help
- Using a captcha would obviously be advantageous or a honeypot style captcha
- Browser fingerprinting could be another technique to detect when many requests come from the same source
- I want to preserve comments in a format that is easy to store, process and
potentially migrate
- In addition to this, the comments should be stored in a static way that is in-keeping with Jekyll’s general approach
- I want to allow users to have an avatar if they desire it
- Gravatar is quite popular and would be nice to support
- Twitter profiles may be useful to support
- GitHub profiles again would be useful to support
- The comment system should be relatively lightweight
- By this I mean there shouldn’t be too many moving parts to it and should not require any heavy systems be used.
- I am thinking of running this on an EC2 instance or as AWS Lambda functions so ideally nothing should be process intensive
That’s quite a few things I want but it is fairly doable.
Some drawbacks of such a design are:
- No guarantee that commenters are who they say they are
- This extends further in that a multiple comments do not guarantee that they are from the same person. In a way this is much like a traditional guestbook on older websites
- Comments will take time to appear on the website
- Since they will be merged into the blog via GitHub they will take a non-trivial amount of time to be approved
- While waiting for a comment to be approved a user may not realise and attempt to leave another
Overall I am willing to live with these drawbacks, at least for the moment.
The comment system boils down to the following kind of top level flow:
- The user reads a post from the blog
- The user decides to leave a comment
- The comment system determines if the comment should be allowed and updates the blog
This diagram does simplify parts of the design like approving the comments, however this could be seen as being outside of the current scope since it would be an external process.
Some Prototyping
I often find it helpful to work through and prototype some ideas roughly before implementing them properly.
Input Data
One such prototype that is (in my opinion) always helpful is thinking about
what kind of inputs and outputs a system will use and produce.
So in this example a comment might look something like the following (as JSON
for ease of reading):
{
"uuid": "UUID",
"post": "post-id or permalink",
"displayName": "User display name",
"avatar": "URL to an avatar",
"webLink": "a URL provided",
"comment": "markdown comment"
}
Some of these could be optional. I might also want to include a client generated date with the data if I want to be able to show when this comment was posted and in what timezone/offset.
So this is the data provided for the comment but there will also be some additional data associated with the comment request:
- Time and date of the request
- Source of the request (IP address etc.)
- Browser metadata and headers
This data could all be used in conjunction with the user generated data, especially the data and time.
Form Prototype
I am not the most visual person, as you can probably tell this by the simple design of this blog. But nevertheless it is important to decide and visualize how the comment form might look. Below is my attempt:
It’s a relatively simple form that relies on the HTML5 form elements and the default theme styling.
For an actual implementation I may add some additional checks and use an AJAX request instead of a form submit action.
The advantage of using an AJAX call are:
- I can do some checks on the client side
- I could prevent sending bad form data, perhaps even do some client side checks for the existence of user provided links
- I could reduce the chances of duplicate requests being sent
- I can decide upon the encoding of the data and add any additional data to the request
- I can react to the response on the blog post page without navigating away
- Basic spam-bots and web-crawlers that do not render JavaScript won’t be able to post comments
Of course there are disadvantages too, mainly that browsers with limited or no JavaScript support won’t work. This may include some screen-reading software used by the visually impaired. However since display of comments will be handled by Jekyll and it’s Liquid templating language existing comments should still be readable in any browser.
Experimenting with Data Files
I am not super familiar with Jekyll’s data files and the Liquid templating language so I thought it prudent to research and experiment with them more.
Jekyll supports YAML, JSON, CSV and TSV files. Out of these file types YAML and JSON are probably the best suited for storing comments due to them not using commas or tabs as separators. My personal preference between YAML and JSON is JSON, mostly because it is a simpler format with much more support, including native JavaScript support in web browsers.
Data files are stored in the _data
folder and can be placed in subfolders,
which is good because that makes it easier to organise the comments I’ll get
into folders based on the posts they are for.
Jekyll makes data accessible by namespace. The example given in the
documentation uses the example files _data/orgs/jekyll.yml
and
_data/orgs/deorg.yml
which are associated with the namespace data.orgs
and
accessible when iterating over that namespace’s members.
Applying this to comments I can see several possible ways of implementing such a system:
Folders for each blog post
Since blog posts in Jekyll are stored as markdown files they can be identified
with names that are safe to use in the file system. For example a blog post
might be named 2019-05-09-comments-on-static-blog.md
on the file system which
translates to the relative link
https://lyndon.codes/2019/05/09/comments-on-static-blog/
.
Now I can use that filename as a folder in the data directory, something like:
_data/comments/2019-05-09-comments-on-static-blog/
and store all comments in
that folder for that given post.
The benefits to this are:
- It is easy to keep track of all comments for each post
- It’s easy to migrate comments with posts if you change post names or even move to a new blogging system.
- When merging in new comments they can all be kept in separate files, reducing problems with merges in Git.
One potential issue with this is ordering comments based on their posted time, thankfully Liquid seems to support this with filters. Even if it didn’t comments could be given filenames that order correctly using an incrementing count or just the current time.
Single files for each blog posts
A similar approach to using a folders and files per comment would be using a single file per blog post. This has similar benefits but other drawbacks like merges being harder when blog post has multiple comments awaiting approval.
Testing
So a test of the folder approach for the data in _data/test/comments/
that
contains 3 files named _0.json
, _1.json
and _3.json
would look something
like this:
{% for comment in site.data.test.comments %}
* {{ comment }}
{% endfor %}
With the rendered output of:
-
_0{“uuid”=>”example UUID”, “displayName”=>”John Smith”, “comment”=>”Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.”, “dateTime”=>”2019-05-23T09:34:43.581Z”}
-
_1{“uuid”=>”example UUID”, “displayName”=>”Jane Test”, “comment”=>”Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.”, “dateTime”=>”2019-05-23T09:37:21.520Z”}
-
2{“uuid”=>”example UUID”, “displayName”=>”Mr Test”, “comment”=>”Foo bar \n abc *123* _xyz
foo.bar()
”, “dateTime”=>”2019-05-23T09:21:01.120Z”}
Notice that the prefix before each item is the filename. So to render the data within we need to select the 2nd item in each comment. So something like the following could render comments:
<table>
<thead>
<tr>
<th>DateTime</th>
<th>Author</th>
<th>Comment</th>
</tr>
</thead>
<tbody>
{% for comment_hash in site.data.test.comments %}
{% assign comment = comment_hash[1] %}
<tr>
<td>{{ comment.dateTime }}</td>
<td>{{ comment.displayName }}</td>
<td>{{ comment.comment }}</td>
</tr>
{% endfor %}
</tbody>
</table>
Which would render to:
DateTime | Author | Comment |
---|---|---|
2019-05-23T09:34:43.581Z | John Smith | Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. |
2019-05-23T09:37:21.520Z | Jane Test | Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. |
2019-05-23T09:21:01.120Z | Mr Test | Foo bar abc *123* _xyz_ `foo.bar()` |
One important thing you might notice is that, in the first example made use of
markdown to render the comments which actually automatically rendered the
comment text of the comments.
In the second, HTML example the comment text is not parsed as markdown and
rendered raw. This means that the special markdown characters are rendered and
the newline character is added to the document itself.
I want to support markdown in my comments so I’ll have to to use a
Liquid filter like
{{ comment.comment | markdownify }}
.
With that in place the comment with markdown would be rendered like so:
Foo bar abc 123 xyz
foo.bar()
Additionally Liquid has filters for displaying dates in a more friendly manner. These include:
date_to_xmlschema
date_to_rfc822
date_to_string
date_to_long_string
So the date 2019-05-23T09:21:01.120Z
could be rendered as:
- 2019-05-23T09:21:01+00:00
- Thu, 23 May 2019 09:21:01 +0000
- 23 May 2019
- 23rd May 2019
Or some other variations based on possible settings for the filters.
Closing
This post has got quite long already so I will end it here.
You can see a lot of the ideas and thought process happening within here however I still haven’t looked into the pull request side of this system. I am aware of JGit for Java which could be used to create branches in Git and have had a quick look at the Github API for creating Pull Requests but will write a separate post on that.
Comments
Date: February 27th, 2021 10:47
URL: https://lyndon.codes
This is a test comment!
I am trying to use markdown.
Is it
working
?