Flexible Markdown Parser

Andrei Fangli andrei_fangli at hotmail.com
Sat Aug 16 10:34:29 EDT 2014


Hello!
 
I'm new around here this being my first post (if I did posted in a wrong meaner please point it out).
 
Since you guys are working on Markdown syntax & parsers I thought I should butt in and show off with a little piece of work that I have been working on.
 
Since I followed the interview with John Gruber and he stated that he wouldn't want to standardize Markdown and would rather keep the syntax description ambiguous so flavors of the language may emerge to suit the needs of the one(s) using Markdown. I realized that a parser with fixed syntax would not encourage these "flavors" at all. Nor encourage people to use it much (since my interpretation of Markdown is not universal). With that in mind I shifted from defining a rather "static" parser to a more flexible one.
 
The implementation (in Perl) offered by John Gruber relies heavily on regex which gave me an idea. I'm a C# developer therefore what I am working on targets .NET Framework and the idea that I got was to make parser that relies mostly on the object model of Regex offered by .NET. The current implementation is rather a prototype for a greater more powerful parser.
 
Now to make more sense of what I am talking about. I wrote the implementation of the parser in an abstract class delegating to subclasses what factories for node creation to use (Factory Method that returns a list of Strategies where the Strategy is object creation). There are two types of nodes: leaf and composite thus there are two Strategy interfaces.
 
The ILeafNodeFactory has a Pattern (Regex), a Name (to use when referencing factories from a composite one and a Create method which, obviously, creates the node given a regex Match).
 
The ICompositeNodeFactory has the same stuff as ILeafNodeFactory only that the Create method takes an extra parameter (a list of child nodes) and for a composite node factory you need to specify what factories are applicable to obtain child nodes (both leaf and composite work).
 
When you want your specific "flavor" of Markdown you simply subclass the MarkdownParser and implement the abstract methods. This means providing a list of ILeafNodeFactory, a list of ICompositeNodeFactory and the name of the factory to apply for the root node. For each node factory you need to provide a unique name and a pattern which identifies the sequence of text representing the node. For composite node factories you also need to provide a list of factory names to know what node factories to use when trying to identify a child node text sequence.
 
The factories are tested in the base class. E.g.: you reference a factory that does not exist, exception is thrown. Or your composite factory has MarkdownTextNode as children and in the applicable child node factories say that a factory which returns a MarkdownNode (base class of MarkdownTextNode) then an exception is thrown because there is no implicit conversion from MarkdownNode to MarkdownTextNode.
 
All these validations are done in the base class constructor (MarkdownParser) meaning that if you want to test your specification you only need to create an instance of it and see if it throws any exceptions.
 
Sadly I haven't got yet to properly testing the parser much (just a dummy test with 2 factories) or make a default implementation for the rules specified on John Gruber's site but I am looking forward to it. The project source code is hosted on BItBucket: <https://bitbucket.org/Andrei15193/markdownparser>.
 
I hope I wasn't too techy or lacky in details. Constructive comments and critics are welcomed! Thanks in advance. :)
 
Yours,
Andrei
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://six.pairlist.net/pipermail/markdown-discuss/attachments/20140816/8396e663/attachment.html>


More information about the Markdown-Discuss mailing list