External Link Highlighting for Kiwix-Serve via Apache Reverse Proxy
External Link Highlighting for Kiwix-Serve via Apache Reverse Proxy
kiwix-serve 3.7.0 · Apache 2.4 · mod_substitute CSS + JS injection
Kiwix-serve hosts offline ZIM archives, but many ZIM articles still contain links pointing to live external websites. This guide injects CSS and JavaScript into ZIM content pages at the reverse-proxy layer to visually distinguish external links and warn users before they leave the mirror.
Table of Contents
- Architecture Overview
- The Problem
- Prerequisites
- Understanding ZIM Content URLs
- What External Links Look Like in ZIM Content
- The gzip Trap
- Scoping Injection with LocationMatch
- The CSS: Visual Indicators
- The JavaScript: Attributes and Tooltips
- The Unicode Escape Trap
- Combining with Content Security Policy
- Full Apache Configuration
- Verification
- Customization
- Troubleshooting
Architecture Overview
Browser ──HTTPS──▶ Apache 2.4 (mod_ssl + mod_proxy + mod_substitute)
│
│ mod_substitute injects <style> + <script>
│ into </head> on /content/* pages only
│
▼
kiwix-serve :8300 (serves ZIM files)
│
├── / Landing page (book library)
├── /viewer Viewer shell (toolbar + iframe)
├── /content/… ZIM article content ← injection target
└── /skin/… Kiwix static assets
The injection targets ZIM article content pages served under /content/. These are the actual wiki/encyclopedia articles that may contain external hyperlinks. The Kiwix UI pages (/, /viewer, /nojs) are left untouched by this feature.
The Problem
ZIM archives are snapshots of live websites. Articles frequently contain hyperlinks to external resources:
- Interwiki links — links to sister projects (e.g., a RuneScape wiki linking to the RuneScape 3 wiki)
- Citation/reference links — links to source material, research papers, news articles
- License links — links to Creative Commons, GFDL, etc.
- “Official website” links — links in infoboxes pointing to project homepages
In the original live wiki, these links work fine. In an offline mirror, clicking them silently navigates away from the mirror to the live internet — or fails entirely if the mirror is on an air-gapped network.
Users need visual cues that distinguish:
- Internal links — links to other articles within the same ZIM (will work offline)
- External links — links to live websites (will leave the mirror or fail)
Kiwix-serve has no built-in feature for this.
Prerequisites
Enable the required Apache modules:
sudo a2enmod ssl rewrite proxy proxy_http headers substitute deflate filter
sudo systemctl restart apache2
| Module | Purpose |
|---|---|
mod_proxy / mod_proxy_http | Reverse proxy to kiwix-serve |
mod_ssl | HTTPS termination |
mod_headers | Strip Accept-Encoding before proxying; set CSP |
mod_substitute | Inject <style> and <script> tags into HTML responses |
mod_deflate | Re-compress responses after substitution |
mod_filter | Required by mod_substitute for content-type filtering |
Understanding ZIM Content URLs
Kiwix-serve 3.7.0 serves ZIM article content under the /content/ path prefix:
/content/<zim_name>/<article_path>
For example:
/content/wikipedia_en_all_maxi_2024-01/A/Linux
/content/oldschool_en_all_maxi_2026-03/Abyssal_whip
/content/stackoverflow_en_all_2024-10/questions/927358
Internal links within a ZIM are relative paths — they stay under the same /content/<zim_name>/ prefix and resolve within the mirror.
External links use absolute URLs starting with http:// or https:// — they point outside the mirror entirely.
This URL structure distinction (http(s):// vs. relative path) is what makes CSS-based detection possible.
What External Links Look Like in ZIM Content
ZIM articles preserve the original HTML of the source wiki. External links appear in several forms depending on the wiki software:
MediaWiki ZIMs (Wikipedia, Wiktionary, RuneScape Wiki, etc.)
<!-- Interwiki external link -->
<a href="https://runescape.wiki/w/RuneScape:Family_Photo"
class="extiw external"
title="rsw:RuneScape:Family Photo">2025 Family Photo</a>
<!-- Citation/reference link -->
<a class="external text"
href="https://creativecommons.org/licenses/by-nc-sa/3.0/">CC BY-NC-SA 3.0</a>
<!-- Bare external link -->
<a class="external autonumber"
href="https://example.com/source">[1]</a>
Other ZIM formats (Zimit captures, StackOverflow, etc.)
External link markup varies, but they all share one thing: the href attribute starts with http:// or https://.
The Detection Rule
Any <a> element whose href starts with http:// or https:// is an external link. Internal ZIM links use relative paths. This is the foundation of both the CSS and JavaScript selectors:
a[href^="http://"], a[href^="https://"]
The gzip Trap
This is the single biggest gotcha — the same one documented in the Dark Mode guide .
The Problem
When a browser sends Accept-Encoding: gzip, kiwix-serve gzip-compresses the response body. mod_substitute performs string matching on the raw response body. If the body is compressed, the literal string </head> doesn’t exist in the binary stream — mod_substitute silently does nothing.
The Fix
Strip Accept-Encoding from the request before it reaches kiwix-serve, then re-compress with mod_deflate after substitution:
RequestHeader unset Accept-Encoding
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType SUBSTITUTE text/html
The request flow becomes:
Browser ──[Accept-Encoding: gzip]──▶ Apache
Apache strips Accept-Encoding ──▶ kiwix-serve
kiwix-serve returns uncompressed ──▶ Apache
mod_substitute injects code ──▶ mod_deflate compresses
Apache ──[Content-Encoding: gzip]──▶ Browser
Important: Only strip Accept-Encoding within the <LocationMatch> block for content pages. Don’t strip it globally — that would force kiwix-serve to send uncompressed images and large articles for no benefit.
Scoping Injection with LocationMatch
We only want to inject into ZIM content pages — not the Kiwix UI:
| Path Pattern | Matches | Inject? |
|---|---|---|
/content/wikipedia_en/… | ZIM article HTML | ✅ Yes |
/content/oldschool_en/… | ZIM article HTML | ✅ Yes |
/ | Kiwix landing page | ❌ No |
/viewer | Kiwix viewer shell | ❌ No |
/skin/… | Kiwix static assets | ❌ No |
/catalog/… | OPDS feed | ❌ No |
The regex ^/content/ matches all ZIM content and nothing else:
<LocationMatch "^/content/">
RequestHeader unset Accept-Encoding
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType SUBSTITUTE text/html
Substitute "s|</head>|<style>…</style><script>…</script></head>|i"
</LocationMatch>
This inserts a <style> block and a <script> block just before the closing </head> tag of every ZIM content page.
The CSS: Visual Indicators
The injected CSS applies three visual treatments to external links:
1. Muted Amber Color + Dashed Underline
a[href^="http://"],
a[href^="https://"] {
color: #986801 !important;
text-decoration: underline dashed #986801 !important;
text-underline-offset: 2px;
}
External links get a muted amber/gold color instead of the default blue. The dashed underline (as opposed to solid) provides a secondary visual cue that’s visible even in dense text.
Why amber? It’s a “caution” color — not alarming (red) but clearly different from normal links (blue). The #986801 value is a dark amber that’s readable on both white and light-gray backgrounds common in wiki articles.
2. Arrow Icon (↗) via ::after
a[href^="http://"]::after,
a[href^="https://"]::after {
content: " ↗";
font-size: .75em;
vertical-align: super;
opacity: .6;
text-decoration: none !important;
display: inline;
}
A small north-east arrow appears after every external link, universally recognized as “opens externally.” The arrow is:
- Superscripted and smaller (
.75em) so it doesn’t disrupt text flow - Semi-transparent (
opacity: .6) so it’s visible but not distracting - Explicitly
text-decoration: noneso the dashed underline doesn’t extend through it
3. Hover Brightening
a[href^="http://"]:hover,
a[href^="https://"]:hover {
color: #c47d00 !important;
text-decoration-style: solid !important;
}
On hover, the link brightens and the underline becomes solid — confirming interactivity and providing feedback before the click.
Full Readable CSS
/* ── External Link Highlighting ── */
/* Base state: muted amber + dashed underline */
a[href^="http://"],
a[href^="https://"] {
color: #986801 !important;
text-decoration: underline dashed #986801 !important;
text-underline-offset: 2px;
}
/* Arrow icon after link text */
a[href^="http://"]::after,
a[href^="https://"]::after {
content: " ↗";
font-size: .75em;
vertical-align: super;
opacity: .6;
text-decoration: none !important;
display: inline;
}
/* Hover: brighten + solidify underline */
a[href^="http://"]:hover,
a[href^="https://"]:hover {
color: #c47d00 !important;
text-decoration-style: solid !important;
}
Why !important?
ZIM articles carry their own stylesheets from the original wiki (MediaWiki, Stack Overflow, etc.). Without !important, our injected rules would lose specificity battles against the original CSS. Since we’re intentionally overriding article styles for external links, !important is appropriate here.
The JavaScript: Attributes and Tooltips
CSS handles the visual treatment, but JavaScript adds functional safety attributes:
document.addEventListener("DOMContentLoaded", function() {
document.querySelectorAll('a[href^="http://"], a[href^="https://"]')
.forEach(function(a) {
// Tooltip showing the full URL + warning
a.setAttribute("title",
"External link: " + a.href + " (leaves the mirror)");
// Security: prevent opener/referrer leakage
a.setAttribute("rel", "noopener noreferrer");
// Open in new tab so user doesn't lose their place
a.setAttribute("target", "_blank");
});
});
What Each Attribute Does
| Attribute | Value | Purpose |
|---|---|---|
title | "External link: <url> (leaves the mirror)" | Tooltip on hover — shows the full destination URL and warns the user |
rel | "noopener noreferrer" | Prevents the external page from accessing window.opener or receiving the referrer header |
target | "_blank" | Opens in a new tab — user doesn’t lose their article position in the mirror |
Why DOMContentLoaded?
The script runs after the HTML is fully parsed but before images/stylesheets finish loading. This ensures all <a> elements exist in the DOM when querySelectorAll runs, without waiting for slow resources.
The Unicode Escape Trap
When building the minified CSS for the Substitute directive, there’s a subtle but critical issue with CSS unicode escapes.
The Problem
The standard CSS way to insert a special character in content is a unicode escape:
content: " \2197"; /* ↗ North-East Arrow */
But mod_substitute processes the entire string as a substitution pattern. The backslash in \2197 gets interpreted during substitution processing, corrupting the output. You might end up with:
content: " 2197"; /* Backslash eaten — just prints "2197" */
Doubling the Backslash Doesn’t Help
You might try \\2197, but Apache’s directive parser may consume one level of escaping, and mod_substitute may consume another. The behavior is unreliable and version-dependent.
The Fix: Use Literal UTF-8
Instead of CSS unicode escapes, embed the literal UTF-8 character directly:
/* ✗ Don't do this — backslash will be eaten */
content: " \2197";
/* ✗ Don't do this — double-backslash is unreliable */
content: " \\2197";
/* ✓ Do this — literal UTF-8 character */
content: " ↗";
Modern browsers handle UTF-8 in CSS content properties natively. As long as your Apache config file is saved as UTF-8 (which is the default on any modern system), the literal character passes through mod_substitute intact.
General rule: Never use CSS unicode escapes (\XXXX) inside mod_substitute directives. Always use the literal UTF-8 character instead.
Combining with Content Security Policy
External link highlighting pairs well with a Content Security Policy (CSP) header. While CSP blocks resource loads (scripts, images, fonts) from external origins, it does not block navigation via <a href> clicks. The two features complement each other:
| Threat | CSP | Link Highlighting |
|---|---|---|
| External script/image loads | ✅ Blocked | — |
User clicking external <a> | — | ✅ Warned visually |
| Tracking pixels in articles | ✅ Blocked | — |
| User accidentally leaving mirror | — | ✅ Tooltip + new tab |
Example CSP Header
Header always set Content-Security-Policy "\
default-src 'self'; \
script-src 'self' 'unsafe-inline' 'unsafe-eval'; \
style-src 'self' 'unsafe-inline'; \
img-src 'self' data: blob:; \
font-src 'self' data:; \
media-src 'self' blob:; \
connect-src 'self'; \
frame-src 'self'; \
object-src 'none'; \
base-uri 'self'"
Note: script-src must include 'unsafe-inline' for the injected <script> tag to execute. If you add CSP, make sure 'unsafe-inline' is present in both script-src and style-src, since we inject both inline styles and inline scripts.
Note: Kiwix-serve also sends its own CSP header on content pages. When both the proxy and upstream set CSP, browsers enforce the most restrictive intersection of both policies. This is generally fine — it means even if our proxy CSP is slightly permissive, Kiwix’s own CSP provides additional lockdown.
Full Apache Configuration
This is a minimal, focused configuration containing only the reverse proxy and external link highlighting. It does not include dark mode, OIDC authentication, or other features.
# mirror.example.com — Kiwix reverse proxy with external link highlighting
# ---------- HTTP → HTTPS redirect ----------
<VirtualHost *:80>
ServerName mirror.example.com
RewriteEngine On
RewriteRule ^(.*)$ https://%{HTTP_HOST}$1 [R=302,L]
</VirtualHost>
# ---------- HTTPS ----------
<IfModule mod_ssl.c>
<VirtualHost *:443>
ServerName mirror.example.com
SSLEngine on
SSLCertificateFile /path/to/fullchain.pem
SSLCertificateKeyFile /path/to/privkey.pem
ProxyPreserveHost On
ProxyPass / http://127.0.0.1:8300/
ProxyPassReverse / http://127.0.0.1:8300/
# ── Content Security Policy (optional but recommended) ──
# Blocks external resource loads. Does NOT block <a> navigation.
# 'unsafe-inline' required for injected <style> and <script>.
Header always set Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; img-src 'self' data: blob:; font-src 'self' data:; media-src 'self' blob:; connect-src 'self'; frame-src 'self'; object-src 'none'; base-uri 'self'"
# ── External-link highlighting (ZIM content pages only) ──
#
# Injects CSS + JS into /content/* pages to visually mark
# anchors that point outside the mirror.
#
# External links get:
# • Dashed amber underline (#986801)
# • A ↗ icon after the link text
# • title tooltip warning the user it leaves the mirror
# • rel="noopener noreferrer" + target="_blank"
#
# Detection: any <a href="http://…"> or <a href="https://…">
# Internal ZIM links use relative paths, so they never match.
#
<LocationMatch "^/content/">
# mod_substitute needs uncompressed input (see "The gzip Trap")
RequestHeader unset Accept-Encoding
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType SUBSTITUTE text/html
Substitute "s|</head>|<style>a[href^=\"http://\"],a[href^=\"https://\"]{color:\#986801!important;text-decoration:underline dashed \#986801!important;text-underline-offset:2px}a[href^=\"http://\"]::after,a[href^=\"https://\"]::after{content:\" ↗\";font-size:.75em;vertical-align:super;opacity:.6;text-decoration:none!important;display:inline}a[href^=\"http://\"]:hover,a[href^=\"https://\"]:hover{color:\#c47d00!important;text-decoration-style:solid!important}</style><script>document.addEventListener(\"DOMContentLoaded\",function(){document.querySelectorAll('a[href^=\"http://\"],a[href^=\"https://\"]').forEach(function(a){a.setAttribute(\"title\",\"External link: \"+a.href+\" (leaves the mirror)\");a.setAttribute(\"rel\",\"noopener noreferrer\");a.setAttribute(\"target\",\"_blank\")})});</script></head>|i"
</LocationMatch>
ErrorLog ${APACHE_LOG_DIR}/mirror-error.log
CustomLog ${APACHE_LOG_DIR}/mirror-access.log combined
</VirtualHost>
</IfModule>
After saving, test and reload:
sudo apache2ctl configtest && sudo systemctl reload apache2
Verification
1. Confirm Injection on Content Pages
Pick a ZIM article you know has external links:
# Should print 1 (CSS injected)
curl -sk https://mirror.example.com/content/oldschool_en_all_maxi_2026-03/Old_School_RuneScape_Wiki \
| grep -c 'text-decoration:underline dashed'
# Should print 1 (JS injected)
curl -sk https://mirror.example.com/content/oldschool_en_all_maxi_2026-03/Old_School_RuneScape_Wiki \
| grep -c 'leaves the mirror'
2. Confirm No Injection on Kiwix UI Pages
# Landing page — should print 0
curl -sk https://mirror.example.com/ | grep -c 'leaves the mirror'
# Viewer — should print 0
curl -sk https://mirror.example.com/viewer | grep -c 'leaves the mirror'
3. Confirm CSP Header (if using CSP)
# Should show Content-Security-Policy header
curl -sk -I https://mirror.example.com/content/oldschool_en_all_maxi_2026-03/Old_School_RuneScape_Wiki \
| grep -i 'content-security-policy'
4. Visual Verification
Open a ZIM article in your browser. External links should appear with:
- Amber color instead of blue
- Dashed underline
- Small ↗ arrow after the link text
- Tooltip on hover showing the full URL
Normal internal link External link with highlighting
─────────────────── ─────────────────────────────────
Abyssal whip RuneScape Wiki ↗
(blue, solid underline) (amber, dashed underline, arrow)
Customization
Changing Colors
Replace #986801 (base amber) and #c47d00 (hover amber) with your preferred colors:
| Style | Default | Description |
|---|---|---|
| Base color | #986801 | Dark amber — visible on white backgrounds |
| Hover color | #c47d00 | Bright amber — feedback on hover |
| Underline color | #986801 | Matches base link color |
Alternative palettes:
| Palette | Base | Hover | Effect |
|---|---|---|---|
| Red warning | #c0392b | #e74c3c | Strong “danger” signal |
| Muted gray | #6c757d | #495057 | Subtle, de-emphasized |
| Teal info | #0d7377 | #14a3a8 | Informational, calm |
| Purple | #6f42c1 | #8b5cf6 | Distinct from blue links |
Changing the Arrow Icon
Replace the literal ↗ character with any unicode symbol:
| Character | Unicode | Name | Effect |
|---|---|---|---|
| ↗ | U+2197 | North-East Arrow | Default — “opens externally” |
| ↪ | U+21AA | Right Arrow with Hook | “Redirects away” |
| ⧉ | U+29C9 | Two Joined Squares | “New window” |
| 🔗 | U+1F517 | Link | Generic link indicator |
| ⚠ | U+26A0 | Warning Sign | Strong “caution” signal |
Remember: always use the literal UTF-8 character, never a CSS \XXXX escape (see The Unicode Escape Trap
).
Removing the Arrow Icon
Delete the entire ::after rule from the CSS (both the http:// and https:// selectors).
Disabling New-Tab Behavior
Remove the target="_blank" line from the JavaScript if you want external links to navigate in the same tab:
// Remove this line:
a.setAttribute("target", "_blank");
Making It Dark-Mode Only
Wrap the CSS in a prefers-color-scheme media query:
@media (prefers-color-scheme: dark) {
a[href^="http://"], a[href^="https://"] { /* … */ }
/* … rest of rules … */
}
In the minified Substitute directive, prepend @media(prefers-color-scheme:dark){ before the first rule and append } after the last rule.
Excluding Specific Domains
If certain external domains should be treated as “internal” (e.g., your own domain), add CSS overrides:
/* Don't highlight links to our own domain */
a[href^="https://example.com/"] {
color: revert !important;
text-decoration: revert !important;
}
a[href^="https://example.com/"]::after {
content: none !important;
}
Add these rules after the main external link rules in the <style> block.
Troubleshooting
CSS/JS doesn’t appear in browser but works with curl
Cause: gzip compression. Your RequestHeader unset Accept-Encoding isn’t being applied.
Fix: Check that:
mod_headersis enabled (apachectl -M | grep headers)- The directive is inside the
<LocationMatch "^/content/">block - You reloaded Apache after changes (
sudo systemctl reload apache2)
Arrow shows as “2197” text instead of ↗
Cause: You used a CSS unicode escape (\2197 or \\2197) instead of the literal character. mod_substitute ate the backslash.
Fix: Replace the escape with the literal UTF-8 character ↗ in your config file. See The Unicode Escape Trap
.
External links aren’t highlighted
Cause 1: The ZIM’s internal links might use absolute URLs (rare but possible).
Diagnosis: Check a page’s HTML:
curl -s http://127.0.0.1:8300/content/<zim_name>/<article> | grep -oP 'href="[^"]*"' | sort -u | head -20
If internal links use http:// or https:// URLs, the CSS selector will match them too. In this case, you’d need a more specific selector (e.g., :not([href*="your-mirror-domain"])).
Cause 2: The article page might not have a </head> tag (malformed HTML in the ZIM).
Diagnosis:
curl -s http://127.0.0.1:8300/content/<zim_name>/<article> | grep -c '</head>'
If this returns 0, mod_substitute has nothing to match against.
Substitute directive causes Apache syntax error
Cause: Unescaped quotes or special characters in the CSS/JS.
Rules for the Substitute directive:
- The outer delimiter is
|(pipe) — chosen because it doesn’t appear in CSS/JS - Double quotes inside the substitution must be escaped as
\"(for attribute selectors and JS strings) - Hash characters (
#) in hex colors must be escaped as\#(otherwise Apache treats them as comments) - The
s|…|…|iformat:s= substitute,i= case-insensitive
Tooltips don’t appear
Cause: The JavaScript might be blocked by CSP.
Fix: Ensure your CSP includes 'unsafe-inline' in script-src:
script-src 'self' 'unsafe-inline'
Without 'unsafe-inline', the injected <script> tag won’t execute and no attributes will be set. The CSS highlighting will still work (since style-src is separate), but tooltips and target="_blank" won’t.
Links briefly flash blue then turn amber
Expected behavior. The browser renders the page with original styles first, then the injected CSS takes effect. The flash is typically imperceptible (sub-frame), but if it’s noticeable, it means your <style> block is loading late. Verify the injection is in </head> (before body rendering) and not in </body>.
How It Works Together (Visual Summary)
Original ZIM article HTML:
┌─────────────────────────────────────────────────────────────┐
│ <a href="/content/zim/Dragon_scimitar">Dragon scimitar</a>│ ← Internal (relative)
│ <a href="https://runescape.wiki/w/Whip">RS3 Wiki</a> │ ← External (absolute)
└─────────────────────────────────────────────────────────────┘
After mod_substitute injection:
┌─────────────────────────────────────────────────────────────┐
│ <style> │
│ a[href^="http://"], a[href^="https://"] { │
│ color: #986801; text-decoration: underline dashed; │
│ } │
│ a[href^="http://"]::after, a[href^="https://"]::after { │
│ content: " ↗"; │
│ } │
│ </style> │
│ <script> │
│ // Adds title, rel, target to external links │
│ </script> │
│ │
│ Dragon scimitar ← Unchanged (relative href) │
│ RS3 Wiki ↗ ← Amber, dashed, arrow, tooltip │
│ title="External link: https://runescape.wiki/… (leaves │
│ the mirror)" │
│ rel="noopener noreferrer" │
│ target="_blank" │
└─────────────────────────────────────────────────────────────┘
Compatibility
| Component | Tested Version |
|---|---|
| kiwix-serve | 3.7.0 (libkiwix 14.0.0, libzim 9.2.3) |
| Apache | 2.4.66 (Debian) |
| Browser | Chromium 131, Firefox 134 |
| ZIM sources tested | MediaWiki (MWOffliner), Zimit captures |
This technique works with any ZIM file that contains standard HTML with <a href="http(s)://…"> links. The CSS attribute selectors and JavaScript querySelectorAll are supported in all modern browsers.