Update (2019-04): a simplified workflow for easily publishing notebooks is now described in the post Blogging with Jupyter notebooks and Hugo. It is based on nb2hugo, a tool to convert Jupyter notebooks into markdown pages with front matter.
Jupyter Notebook is a great way to create a single document that contains code that can be executed, formatted text to provide detailed explanations, as well as figures. It is even possible to easily include mathematical expressions that will be beautifully rendered.
Hugo is a simple yet very powerful static site generator.
Being able to write an article entirely in Jupyter Notebook and directly convert it to Hugo content would be perfect, but how could we proceed?
Existing tools
After a quick look at what is already existing, we found that:
- Hugo does not directly provide such a converter, but there is a request for such a feature.
- There is a Python library to solve this problem, jupyter_hugo, but Hugo’s front matter has to be written inside the Notebook metadata and we are not fond of this solution.
- A Jupyter Notebook can be easily converted to markdown either through the Notebook menu or a Python library, nbconverter. Hugo’s page content is just markdown with an added front matter, so getting what we want shouldn’t be too difficult.
Toward a first solution
We would like the converted Hugo’s page to look as much as possible as the original Jupyter Notebook. And we don’t want to have to repeat any information since this would be error prone. In particular:
- the Notebook should have a formatted title and this title should be directly converted to Hugo’s title,
- we should be able to specify and modify the front matter fields directly inside the Jupyter Notebook.
We could consider using the first cell as front matter. Using a raw cell with the yaml, toml or json front matter in it and converting the notebook into markdown would probably provide a correct Hugo format. We could start by trying this simple solution to see if it would work. We will use the following cells between separation lines as a “reference notebook” to convert:
+++
title = "A basic notebook with a toml front matter inside a raw cell..."
date = "2018-06-01"
+++
… And some text with formatting inside a markdown cell.
# And a simple python code:
f = lambda x: x**2 + 1
print("f(x) for x from 0 to 4:", *map(f, range(5)))
f(x) for x from 0 to 4: 1 2 5 10 17
Let’s use the nbconvert library to convert this “reference notebook”:
import nbconvert
exporter = nbconvert.MarkdownExporter()
(body, resources) = exporter.from_filename('toward-publishing-jupyter-notebooks-with-hugo.ipynb')
ref_nb = body.split('---')[1]
print(ref_nb)
+++
title = "A basic notebook with a toml front matter inside a raw cell..."
date = "2018-06-01"
+++
... And some *text* with **formatting** inside a markdown cell.
```python
# And a simple python code:
f = lambda x: x**2 + 1
print("f(x) for x from 0 to 4:", *map(f, range(5)))
```
f(x) for x from 0 to 4: 1 2 5 10 17
We see that proceeding this way we can very easily get a Hugo’s output with all the front matter fields. However, a notebook starting with this raw front matter is quite ugly. We would like both Hugo’s page and the notebook to look good.
We therefore propose the following solution, similar to the one used for content summaries: we will add an html comment as a front matter divider. Everything in the notebook before the End Of Front Matter divider <!--eofm-->
will be the front matter.
With this solution, the previous “reference notebook” becomes:
A basic notebook with a front matter and front matter divider…
Date: 2018-06-01
… And some text with formatting inside another markdown cell.
# And a simple python code:
f = lambda x: x**2 + 1
print("f(x) for x from 0 to 4:", *map(f, range(5)))
f(x) for x from 0 to 4: 1 2 5 10 17
Converting such a notebook will require a bit more work since we will have to parse the text before the <!--eofm-->
divider. Below is a simple prototype:
import nbconvert
import warnings
def toml_frontmatter(nb_fm):
""" Convert the notebook front matter, i.e. the text before the <!--oefm--> divider, into
a toml front matter.
"""
toml_fm = '+++\n'
for line in nb_fm.split('\n'):
stripped = line.strip()
if stripped:
if stripped.startswith('# '): # The line contains the title
toml_fm += 'title = "' + stripped[2:].strip() + '"\n'
else: # The line is expected to contain a field of type "key: value0, value1, ..."
s = stripped.split(':', 1)
if len(s) < 2: # Bad formatting
warnings.warn(f'The following content is not formatted correctly and is ignored: {stripped}.')
continue
key, values = s
key = key.lower()
values = [value.strip() for value in values.split(',')]
if len(values) > 1: # The field has multiple values (e.g. multiple tags)
toml_fm += key + ' = [' + ', '.join([f'"{value.strip()}"' for value in values]) + ']\n'
else: # The field has a single value (e.g. date)
toml_fm += f'{key} = "{values[0]}"\n'
toml_fm += '+++\n'
return toml_fm
exporter = nbconvert.MarkdownExporter()
(body, resources) = exporter.from_filename('toward-publishing-jupyter-notebooks-with-hugo.ipynb')
ref_nb = body.split('---')[4]
print('============RAW TEXT============')
print(ref_nb)
print('============RESULT============')
nb_fm, content = ref_nb.split('<!--eofm-->', 1)
md = toml_frontmatter(nb_fm) + content
print(md)
============RAW TEXT============
# A basic notebook with a front matter and front matter divider...
Date: 2018-06-01
<!--eofm-->
... And some *text* with **formatting** inside another markdown cell.
```python
# And a simple python code:
f = lambda x: x**2 + 1
print("f(x) for x from 0 to 4:", *map(f, range(5)))
```
f(x) for x from 0 to 4: 1 2 5 10 17
============RESULT============
+++
title = "A basic notebook with a front matter and front matter divider..."
date = "2018-06-01"
+++
... And some *text* with **formatting** inside another markdown cell.
```python
# And a simple python code:
f = lambda x: x**2 + 1
print("f(x) for x from 0 to 4:", *map(f, range(5)))
```
f(x) for x from 0 to 4: 1 2 5 10 17
Ok, we got a first method that works well with simple examples. There is still some work to do, but we are probably on the good way…