How to convert documents between various markup languages using pandoc
Whether you want it or not the chances are that sooner or later when working as a system administrator you will be required to convert documents between various markup languages or file formats such as doc, pdf, jpg etc. This article will help you to get started with a few examples on how to convert various documents using pandoc markup conversion tool.
2. Pandoc markup converter examples
Pandoc is a must have swiss knife tool when it comes to conversion between various markup languages. To get you started first install pandoc converter:
$ sudo apt-get install pandoc
General and most frequently used syntax for pandoc is:
$ pandoc -f <from format> -t <to format> <source file>
What follows are few examples of pandoc usage:
2.1. HTML to TEXTILE
Convert html file example.html to textile markup language.
$ pandoc -f html -t textile example.html
The above command will produce an STDOUT to screen so feel free to save your textile output to some file like:
$ pandoc -f html -t textile example.html > textileout.txt
2.2. TEXTILE to HTML
The above syntax works both ways. Here we will convert from textile to html.
$ pandoc -f textile -t html textileout.txt
2.3. TEX to DOCX
$ pandoc -s example.tex -o example.docx
2.4. URL to DOCX
Here pandoc will fetch the entire content of a given website URL and save it as docx:
$ pandoc http://how-to.linuxcareer.com -o example.docx
2.5. URL to PDF
$ pandoc http://how-to.linuxcareer.com -o linux-career.pdf
Please note that you need texlive-latex-base package to be installed first, before you can convert to PDF format. Otherwise you will get a following error:
pandoc: pdflatex not found. pdflatex is needed for pdf output.
2.6. HTML to MARKDOWN
$ pandoc -f html -t markdown example.html
2.7. MARKDOWN to PDF
$ pandoc -s markdown-sample.md -o example.pdf
Please note that the above only scratches a surface of pandoc's abilities as pandoc is able to convert between other markup formats such us reStructuredText, XHTML, HTML 5, LaTeX (including beamer slide shows), ConTeXt, RTF, DocBook XML, OpenDocument XML, ODT, GNU Texinfo, MediaWiki markup, EPUB, Textile, groff man pages, Emacs Org-Mode, AsciiDoc, and Slidy, Slideous, DZSlides, or S5 HTML slide shows. Enter
$ man pandoc
command for more information.