From a GitHub Markdown header
# Söme/title-header_
GitHub's renderer creates the anchor
#sömetitle-header_
Apparently, spaces and /
are removed, letters (ASCII and Unicode) are lowercased, and -
and _
are preserved.
Is this correct; are there other rules?
GitHub.com's process for converting Markdown heading text to id=""
attributes for automatic #fragment
links is not defined by any of the Markdown specifications nor implementations .
For example, it isn't described in the GitHub Flavored Markdown Spec .
Instead, it's something that GitHub do themselves privately after initial conversion from Markdown to HTML is completed, this is described in Step 4 in GitHub's own readme file on the topic (emphasis mine):
This library is the first step of a journey that every markup file in a repository goes on before it is rendered on GitHub.com:
github-markup
selects an underlying library to convert the raw markup to HTML.- The HTML is sanitized, aggressively removing things that could harm you and your kin—such as script tags, inline-styles, and class or id attributes.
- Syntax highlighting is performed on code blocks. See github/linguist for more information about syntax highlighting.
- The HTML is passed through other filters that add special sauce, such as emoji, task lists, named anchors, CDN caching for images, and autolinking.
- The resulting HTML is rendered on GitHub.com.
.md
/ Markdown files are processed by CommonMarker + libcmark, which does not include id=""
attribute and #fragment
URI generation as a built-in feature, but CommonMarker's documentation actually provides a sample implementation of Markdown header id=""
attributes for #fragment
links on the front-page , repeated below:
class MyHtmlRenderer < CommonMarker::HtmlRenderer
def initialize
super
@headerid = 1
end
def header(node)
block do
out("<h", node.header_level, " id=\"", @headerid, "\">",
:children, "</h", node.header_level, ">")
@headerid += 1
end
end
end
# this renderer prints directly to STDOUT, instead
# of returning a string
myrenderer = MyHtmlRenderer.new
print(myrenderer.render(doc))
# Print any warnings to STDERR
renderer.warnings.each do |w|
STDERR.write("#{w}\n")
end
The above above generates numeric monotonically increasing header id=""
values, which helps prevent id=""
collisions (though this is not a perfect solution), whereas as you've observed GitHub prefers to use the header's textContent
as the basis for id=""
attribute values.
...which means that GitHub is simply doing their own thing when it comes to generating id=""
attributes, and there is no published specification for whatever transformation GitHub is using.
The following code gives me quite some mileage:
lower=header.strip().lower().replace(" ","-")
anchor=""
for c in lower:
if c.isalnum() or c in "-_":
anchor+=c
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.