简体   繁体   中英

Remove periods from the end of markdown paragraphs

I have a bunch of posts written in markdown and I need to remove the periods from the end of every paragraph in each of them

The end of a paragraph in markdown is delimited by:

  • 2 or more \\n s or
  • The end of the string

However, there are these edge cases

  1. Ellipses
  2. Acroynms (eg, I don't want to drop the final period in "Notorious BIG" when it falls at the end of a paragraph). I think you can deal with this case by saying "don't remove the final period if it's preceded by a capital letter which is itself preceded by another period"
  3. Special cases: eg , ie , etc.

Here's a regular expression that matches posts that have offending periods, but it doesn't account for (2) and (3) above:

/[^.]\\.(\\n{2,}|\\z)/

(?<!\.[a-zA-Z]|etc|\.\.)\.(?=\n{2,}|\Z)
  • (?<!\\.[a-zA-Z]|etc|\\.\\.) - lookbehind to make sure that the period is not preceded by sequences like .T , etc , .. (for ellipsis).
  • \\. the period
  • (?=\\n{2,}|\\Z) lookahead to look for end of a markdown paragraph (two newlines or end of string)

Test:

s = """ths is a paragraph.

this ends with an ellipsis...

this ends with etc.

this ends with B.I.G.

this ends with e.g.

this should be replaced.

this is end of text."""
print s.gsub(/(?<!\.[a-zA-Z]|etc|\.\.)\.(?=[\n]{2,}|\Z)/, "") 
print "\n"

Output:

this is a paragraph

this ends with an ellipsis...

this ends with etc.

this ends with B.I.G.

this ends with e.g.

this should be replaced

this is end of text

A Ruby 1.8.7 compatible algorithm:

s = %{this is a paragraph.

this ends with an ellipsis...

this ends with etc.

this ends with B.I.G.

this ends with e.g.

this should be replaced.

this is end of text.}.strip

a = s.split(/\n{2,}/).each do |paragraph|
  next unless paragraph.match /\.\Z/
  next if paragraph.match /(\.[a-zA-Z]|etc|\.\.)\.\Z/
  paragraph.chop!
end.join("\n\n")

>> puts a
this is a paragraph

this ends with an ellipsis...

this ends with etc.

this ends with B.I.G.

this ends with e.g.

this should be replaced

this is end of text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM