I have a bunch of posts written in markdown and I need to remove the periods from the end of every paragraph in each of them
The end of a paragraph in markdown is delimited by:
\\n
s or However, there are these edge cases
eg
, ie
, etc.
Here's a regular expression that matches posts that have offending periods, but it doesn't account for (2) and (3) above:
/[^.]\\.(\\n{2,}|\\z)/
(?<!\.[a-zA-Z]|etc|\.\.)\.(?=\n{2,}|\Z)
(?<!\\.[a-zA-Z]|etc|\\.\\.)
- lookbehind to make sure that the period is not preceded by sequences like .T
, etc
, ..
(for ellipsis). \\.
the period (?=\\n{2,}|\\Z)
lookahead to look for end of a markdown paragraph (two newlines or end of string) Test:
s = """ths is a paragraph.
this ends with an ellipsis...
this ends with etc.
this ends with B.I.G.
this ends with e.g.
this should be replaced.
this is end of text."""
print s.gsub(/(?<!\.[a-zA-Z]|etc|\.\.)\.(?=[\n]{2,}|\Z)/, "")
print "\n"
Output:
this is a paragraph
this ends with an ellipsis...
this ends with etc.
this ends with B.I.G.
this ends with e.g.
this should be replaced
this is end of text
A Ruby 1.8.7 compatible algorithm:
s = %{this is a paragraph.
this ends with an ellipsis...
this ends with etc.
this ends with B.I.G.
this ends with e.g.
this should be replaced.
this is end of text.}.strip
a = s.split(/\n{2,}/).each do |paragraph|
next unless paragraph.match /\.\Z/
next if paragraph.match /(\.[a-zA-Z]|etc|\.\.)\.\Z/
paragraph.chop!
end.join("\n\n")
>> puts a
this is a paragraph
this ends with an ellipsis...
this ends with etc.
this ends with B.I.G.
this ends with e.g.
this should be replaced
this is end of text
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.