HTML (with microformats, microdata) → Markdown (GitHub-Flavored Markdown, Commonmark)
I have a version of my bio that is written in HTML with lots of microformats and microdata embedded. https://ryanparman.com/about/#full-length
I wanted to produce a Markdown (Commonmark, really) version without having to do the conversion by hand. https://ryanparman.com/about/#markdown
NOTE: For those who don't know, macOS is a blend of the XNU kernel and FreeBSD tools. Most Linuxes use the GNU flavor of tools. In the example code, there is a reference to
sed
which should be the GNU version, not the built-in BSD version. You can install the right version using Homebrew.
cat author.html | sed -r "s/<\/?span([^>]*)>//g" | pandoc -r html -w gfm --columns 10000 | tee author.md
What this does:
- Reads the
author.html
file to stdout - Pipes the content into GNU
sed
(which supports Perl-compatible regular expressions with-r
) to strip out all<span>
tags and attributes - Pipes that to a tool called Pandoc, which converts the HTML to GitHub-Flavored Markdown (which is now a superset of Commonmark)
- Overwrites the contents of
author.md
with Pandoc's results
Written by Ryan Parman
Related protips
Have a fresh tip? Share with Coderwall community!
Post
Post a tip
Best
#Pandoc
Authors
Sponsored by #native_company# — Learn More
#native_title#
#native_desc#