In a previous article, I covered how to batch convert a handful of Markdown files to HTML using pandoc. In this article, several HTML files have been created, but there is much more pandoc can do. It has been called the “Swiss Army Knife” of document conversion – and for good reason. There’s not much he can’t do.
Pandoc can convert .docx, .odt, .html, .epub, LaTeX, DocBook, etc. to these formats and others, such as JATS, TEI Simple, AsciiDoc, etc.
Yes, that means pandoc can convert .docx files to .pdf and .html, but you might be thinking, “Word can also export files to .pdf and .html. Why would i need pandoc? “
You would be right, but since pandoc can convert so many formats, it might just become your go-to tool for all your converting tasks. For example, many of us know that Markdown editors can export their Markdown files in .html format. With pandoc, Markdown files can also be converted to many other formats.
I rarely exported Markdown to HTML; I normally let pandoc do it.
Converting file formats with Pandoc
Here I will convert the Markdown files to a few different formats. I write almost everything using Markdown syntax, but I often have to convert to another format: .docx files are usually needed for school work, .html for webpages I create – and for .epub work, .pdf for flyers and handouts, and even an occasional TEI Simple file for a digital humanities college project. Pandoc can handle all of this, and more, easily.
First of all, you need to install pandoc. Additionally, to create .pdf files, LaTeX will also be required. My favorite package is TeX Live.
To note: If you would like to try pandoc before installing it, there is an online trial page at: http://pandoc.org/try/
Install pandoc and texlive
Users of Ubuntu and other Debian distributions can type the following commands into the terminal:
sudo apt-get update sudo apt-get install pandoc texlive
Notice on the second line you install pandoc and texlive in one go. The apt-get command will have no problem with this, but go have a coffee; This may take a few minutes.
Go to conversion
Once pandoc and texlive are installed, you can do a bit of work!
The sample document for this project will be an article that was first published in the North American Journal in December 1894, and is entitled: “How to repel train thieves”. The Markdown file I’m going to use was created some time ago as part of a restore project.
how_to_repel_train_robbers.md is located in my Documents directory, in a subdirectory named samples. Here’s what it looks like in Ghostwriter.
I want to create .docx, .pdf, and .html versions of this file.
The first conversion
I will start by making a .pdf copy, as I had the trouble of installing a LaTeX package.
In the ~ / Documents / samples / directory, I type the following to create a .pdf file:
pandoc -o htrtr.pdf how_to_repel_train_robbers.md
The above command will create a file called htrtr.pdf from the how_to_repel_train_robbers.md file. The reason I used htrtr as the name was because it is shorter than how_to_repel_train_robbers – htrtr is the first letter of each word in the long title.
Here is a preview of the .pdf file once it’s done:
The second conversion
Then I want to create a .docx file. The command is almost identical to the one I used to create the .pdf and it is:
pandoc -o htrtr.docx how_to_repel_train_robbers.md
In no time at all, a .docx file is created. Here’s what it looks like in Libre Writer:
The third conversion
Maybe I would like to post this on the web so a web page would be nice. I’ll create an .html file with this command:
pandoc -o htrtr.html how_to_repel_train_robbers.md
Again, the command to create it looks a lot like the last two conversions. This is what the .html file looks like in a browser:
Haven’t you noticed anything yet?
Let’s look at the orders again. They were:
pandoc -o htrtr.pdf how_to_repel_train_robbers.md pandoc -o htrtr.docx how_to_repel_train_robbers.md pandoc -o htrtr.html how_to_repel_train_robbers.md
The only difference between these three commands is the extension next to htrtr. This gives you a clue that pandoc is relying on the extension of the output file name you provide.
Pandoc can do a lot more than just the three little conversions done here. If you are writing in a preferred format, but need to convert the file to another format, chances are pandoc can do it for you.
What would you do with it? Would you like to automate this? What if you had a website with articles for your readers to download? You can modify these little commands to work like a script, and your readers can decide what format they want. You can offer .docx, .pdf, .odt, .epub, or more. Your readers choose, the appropriate conversion script runs, and your readers download their file. It can be done.