Creating a blog with Sphinx#
Finally, I’ve taken the step of restarting my blog, and with it, the migration process that I wanted to undertake. On this process, I’ll be telling you about it in several entries, since the migration isn’t complete yet, and this is the first entry regarding this topic, covering the basics so you don’t fail on your attempt.
Why Sphinx?#
Sphinx is a static documentation generator that is almost the standard for projects developed in Python, and has a good number of extensions that support this generator, including even one that converts your project into a blog, called Ablog.
Since my primary programming language is Python, it’s clear why I want an option based on Python. However, before I used Nikola, which is also developed in Python but has Sphinx with a larger and more active community (including users and developers). Additionally, the development of Sphinx and its extensions is aligned with changes in the ecosystem and aligns with docutils directives. Given the recent changes in the ecosystem, I find it interesting how Sphinx supports Jupyter Notebooks and Myst, which results in code entries having a benefit, and Myst expands on Markdown options (and is not supported well enough by Nikola or other generators I reviewed).
Also, I’ve been more familiar with Sphinx, but at the moment when I switched to a static generator, it wasn’t very well-known, but now it has attracted a larger community with MyST support.
Creating the blog#
Alright, let’s get started. We’ll follow the step-by-step process.
Dependencies#
We will use Sphinx for static content generation and Ablog will allow us to include tags for dates, which is the essential difference between pages and publications (and with that, options for filtering indices).
Regarding appearance, we can find different themes, but I have inclined towards PyData (since several blogs that I follow use it, so I’ve seen its result and want to dig deeper into how it’s used). Additionally, it’s a good idea to add Sphinx Design to add additional components such as grids, tabs, cards, and others.
Without a doubt (and although I used it in Nikola), having YouTube videos is necessary, so we need to have Sphinx Contrib Youtube installed.
Although for sharing on social media we don’t need anything special, having the right metadata generation will help with better indexing and previewing. To this end, we’ll use Sphinx Ext OpenGraph (which not only includes the OpenGraph protocol but also an extra tag for Twitter Card). In this case, we have a detail to mention, and it’s that generating images shared (since by default it doesn’t use the first image of the post and there isn’t always one) requires installing Matplotlib.
Finally, to support Markdown, we’ll use Myst Parser. It’s interesting that in the extension with Myst NB, we’ll get support for generating publications in Notebook using MystMd (so we’ll also install Jupyterlab). This is something I love because it will be natural to publish notes on code with their results.
Sitemap generates the site map, although to use it with the desired internationalization scheme, I’ll have to modify it in the future.
Copy button will help us create the option to copy the code blocks to the clipboard.
With these details, our file requirements.txt
will look like this:
# Static site generation and theme
sphinx
ablog
pydata-sphinx-theme
# Components
sphinx-design
sphinx-copybutton
sphinxcontrib-youtube
# Metadata
sphinxext-opengraph
matplotlib
sphinx-sitemap
# Markdown, MyST and Notebook support
myst-parser
jupyterlab
jupyterlab_myst
myst-nb
Optionals
Not everything is about automatic content generation, we also need support
while writing. So, we can install doc8, rstcheck, and esbonio, for validating
our .rst
files and Jupyterlab Myst, to help with rendering in the
Notebook as we draft.
With that, we can have our requirements-dev.txt
file like this:
jupyterlab_myst
rstcheck
doc8
esbonio
If we also use VSCode, it’s worth installing the following extensions:
MyST-Markdown: For editing MystMD
Esbonio: For editing RST
reStructuredText (lexstudio): For editing RST
Jupyter: To manipulate notebooks
Emoji: To insert emojis with the command palette 😀
Spell Right: For spelling correction
Font Awesome Gallery: To find the notation for Font Awesome icons if you plan to use them (it has a significant impact on page load time).
Extras
Well, here are some extensions that might be useful, but weren’t for me:
sphinxext-rediraffe: It’s definitely worth having around. This extension is highly recommended for making redirects. In my case, I don’t need advanced options yet, so I’m good with the redirect option from
:redirect:
in publication posts.sphinx-intl: Helps with internationalization. However, I don’t think it’s suitable for projects like a blog, where not all entries necessarily have translations or may require context that makes translation incomplete. I find it more suitable for documentation rather than a blog.
ABlog configuration#
You can use ablog start
and respond to the basic initialization questions.
> Root path for your project (path has to exist) [.]:
> Project name: Cosmoscalibur
> Author name(s): Edward Villegas-Pulgarin
> Base URL for your project: https://www.cosmoscalibur.com/
Since GitHub Pages can only use the docs/
directory for static sites, it’s
convenient to set the root directory of the project to be the root directory of
the repository (if you’re using a Git repository) so that the output directory
is at this level.
Regarding the project name, by default it will include the word “blog” at the end of the name we provide (in English, it would be perfect). However, we also find that the HTML file contains a mention of “documentation”, which is not suitable for our blog. We’ll adjust this in the following variables:
project = 'Cosmoscalibur'
blog_title = 'Cosmoscalibur'
html_title = 'Cosmoscalibur'
html_short_title = 'Cosmoscalibur'
Regarding the theme, by default Ablog sets up Alabaster, but as I mentioned
earlier, we will use the PyData theme and can remove Alabaster from the import
block.
html_theme = 'pydata_sphinx_theme'
To include OpenGraph metadata, we’ll add the base URL and can also add custom
brands. As a result, I’ll be able to include my Twitter creator card (now known
as X), and the type specification will default to summary_large_image
(this
setting doesn’t seem to be customizable).
ogp_site_url = 'https://www.cosmoscalibur.com'
ogp_custom_meta_tags = [
'<meta name="twitter:creator" content="@cosmoscalibur" />',
]
To configure the default language, we use the corresponding variable with the
ISO code of the language (in my case, es
from spanish).
language = 'es'
In my case, spanish is my native language, but I plan to post some content in
English occasionally. That’s why I’ll be setting up an internationalization
pattern for the blog using the format <lang>/blog/<year>/<post>
, so that by
changing just the <lang>
segment, you can access the version in another
language.
This affects the blog_path_pattern
variable, which allows defining the URL
pattern to automatically recognize and display publications (no need to add a
tag). Additionally, I’ll need to define a route for the publication archive.
blog_path = 'blog'
blog_post_pattern = '*/blog/*/*'
We also need to set up other directories and files. For ease of use with GitHub Pages, we’ll remove the underscore from the directories destined for static assets and templates (by default they’re ignored if they start with this notation). We’ll also add a special directory that will allow us to add files directly to the root deployment (for example, to add the CNAME file).
It’s important to note that for GitHub Pages generation, the output directory
must be docs
.
Finally, we need to define the configuration variables in conf.py
.
templates_path = ['templates']
html_static_path = ['static']
html_extra_path = ['files']
ablog_website = 'docs'
Now we’ll define the files that shouldn’t be processed. This is important because when the Sphinx directory is at the same level as the repository directory, all files are seen. Additionally, there are generated Sphinx files that if not removed, will attempt to be processed in a subsequent deployment.
exclude_patterns = [
"_build",
"***/.ipynb_checkpoints/*",
'Pipfile',
'LICENSE',
'README.md',
'requirements*.txt',
'.vscode',
'.venv',
'docs',
'.doctrees',
'.gitignore',
]
To enable the extensions we’re going to use, we need to list them in
extensions
. A detail is that although we have installed Myst Parser, we aren’t
adding it explicitly because it’s already defined within Myst NB.
Some errors are caused by this:
WARNING: while setting up extension myst_nb: role 'sub-ref' is already registered, it will be overridden
WARNING: while setting up extension myst_nb: directive 'figure-md' is already registered, it will be overridden
Extension error:
Config value 'myst_commonmark_only' already present
---
source_suffix '.md' is already registered
Thus, we can define:
extensions = [
'sphinx.ext.extlinks',
'sphinx.ext.intersphinx',
"myst_nb",
"sphinx_design",
"sphinxext.opengraph",
"sphinxcontrib.youtube",
'ablog',
'sphinx_sitemap',
'sphinx_copybutton',
]
The first two cases we’ve enabled are default extensions that will help us shorten URL notation and make convenient links to other Sphinx projects (we’ll review this in future entries).
Now that we have Myst NB enabled, we can remove the line from source_suffix
because it’s configured by the extension, and the default value prevents
Markdown files from being compiled.
Regarding Myst options, we’re going to enable several extensions that will allow
us to use directives more easily (not using backticks), make substitutions, and
enable dollar signs for equations. Additionally, we’ll create references
(targets
) for titles up to three levels deep (h1, h2, and h3). We can also add
tags more easily in blocks or lines, substitute with Jinja2, definition blocks,
replacements, or task lists. I’m omitting linkify
as it doesn’t seem
particularly useful.
myst_enable_extensions = [
"amsmath",
"attrs_inline",
"colon_fence",
"deflist",
"dollarmath",
"fieldlist",
"html_admonition",
"html_image",
"replacements",
"smartquotes",
"strikethrough",
"substitution",
"tasklist",
]
myst_heading_anchors = 3
Similarly, I hope that in the table of contents on the right, third-level titles will be displayed. To do this, we need to adjust the theme configurations.
Our Google Analytics identifier is also available in this same section (PyData also supports Plausible).
We can also add our Twitter and GitHub links to the theme configuration.
html_theme_options = {
'show_toc_level': 2,
'twitter_url': 'https://twitter.com/cosmoscalibur',
'github_url': 'https://github.com/cosmoscalibur/',
}
html_theme_options['analytics'] = {'google_analytics_id': 'G-4YFQBC69LB'}
We have separated the analytics
line to easily disable it in testing, so that
it doesn’t affect metrics. Later, we want to make this happen with automatic
deployment on GitHub Actions and dependent on an environment variable.
Regarding sidebar panels, we are doing the following configuration at the moment:
html_sidebars = {
'index': [],
"blog": ["ablog/categories.html", "ablog/archives.html"],
"*/blog/**": ["ablog/postcard.html", "ablog/recentposts.html", "ablog/archives.html"],
}
Don’t worry, I’ll explain it in another entry. For now, I’ll take one by default, since these panels need to be customized and we need to do HTML, so that I can think carefully about what to put when they’re not blog posts (where the provided cases seem perfect to me).
We can also include icons from Font Awesome, but keep in mind that it may have a significant impact on the site’s load time, as this is not optimized. In my case, when testing, I see that it is the biggest penalty to page loads at the level of scripts.
fontawesome_included = True
I must say that I don’t like the idea of enabling the display of source code from the publication on the site itself, because it generates duplication of files in the output directory, copying them. If someone wants to see the code, I consider the repository for that purpose, unfortunately, disabling it would still cause duplication, so I’ll leave it enabled (if I find a way to remove the duplicates, I’ll disable it).
html_show_sourcelink = True
Now, the configuration for publications. We need to define the format of the date stamp now, and we can set it at will (but it must be consistent with this option across all publications). We can also specify how many paragraphs will be used in the description and which image to consider for previewing. Finally, if there’s any redirection involved, the time in seconds for its execution.
post_date_format = '%Y-%m-%d'
post_date_format_short = '%Y-%m-%d'
post_auto_excerpt = 1
post_auto_image = 1
post_redirect_refresh = 0
Now we’re going to enable full-text content for the feeds so they can be consumed in their entirety by readers of this format.
blog_feed_fulltext = True
It’s essential to configure our sitemap generation to assist search engines.
sitemap_url_scheme = '{link}'
html_baseurl = 'https://www.cosmoscalibur.com/'
Finally, and although it’s not the last thing I plan to configure (but will be covered in another entry), let’s talk about comments via Disqus (although I’m considering changing it, but that will also be a topic for another entry).
disqus_shortname = 'XXXXXXX'
Additional Files and Directories#
Within the configuration (conf.py
), we added the files
directory. This allows us to directly add files to the root directory of our
generated site. This is important because some validations require files in this
position, such as:
CNAME
: For the validation of our domain name..nokekyll
: To tell GitHub that we’re using Jekyll for compilation. If we don’t do this, directories and files starting with a hyphen are ignored.Google Site verification file: to demonstrate domain ownership to Google.
Other ownership verification files.
The
robots.txt
file.
Additionally, if you use a directory at the root level of your repository, it
will become available here. This is my case with the images
directory
for the images I use.
First Publication#
Ablog helped us generate some demonstration files during initialization. You can edit and customize them as you see fit.
The case of index.rst
requires that it be placed in the generated
location for project compilation.
It also created a default publication, which we can move to our
es/blog/2024
directory (based on the year of creation) and edit there.
Keep in mind that the first-level heading is the title of the publication. In other generators, titles are added as part of a directive. Since we’ve already configured the route pattern for publications and are following it, we don’t need to add directives but rather the front matter, such as:
:redirect: blog/configurar-retroarch-en-steam
:date: 2021-12-14
:tags: steam, retroarch, libretro, gaming, linux, controles, videojuegos, emuladores
:category: tecnología/videojuegos
:author: Edward Villegas-Pulgarin
:language: es
---
date: 2024-05-14
tags: blog, sphinx, python, ablog, pydata
category: tecnología
author: Edward Villegas-Pulgarin
language: es
---
The syntax of Markdown and ReStructuredText is available for you to consult. It’s not difficult.
In my case, I am the only author of the blog and in general, I will publish in
Spanish, so it’s worth defining the author and default language in the
conf.py
file.
blog_default_author = 'Edward'
blog_authors = {
'Edward': ('Edward Villegas-Pulgarin', None),
}
blog_default_language = 'es'
blog_languages = {
'es': ('Español', None),
'en': ('English', None),
}
Blog Generation#
We’re ready, so let’s generate the site.
ablog clean && ablog build && ablog serve
The part of ablog clean
is necessary if we want a full rebuild, as it clears
the temporary files generated to avoid recompiling everything. With
ablog serve
, the browser will open and we can explore.
Now, add, commit and push to your GitHub repo (or any host).
What’s the next?#
Well, I’ll tell you that now the blog posting process will continue. However, there’s still some exploration left to do.
As for me, here are some details I want to work on soon:
A Google-based search bar.
Manual internationalization support based on directory, rather than using
.pot
.Support for sitemaps with internationalization, following the previous scheme.
Linking to external resources beyond just GitHub and Twitter (e.g. Mastodon).