简体   繁体   中英

Ruby: Remove new lines, carriage returns from text

I have a text string such as this, and I need to remove all the carriage returns and backslashes.

An Ox came down to a reedy pool to drink. As he splashed heavily into the water, he crushed a young Frog into the mud.\\r\\n\\r\\nThe old Frog soon missed the little one and asked his brothers and sisters what had become of him.\\r\\n\\r\\n\\"A great big monster,\\" said one of them, \\"stepped on little brother with one of his huge feet!\\"\\r\\n\\r\\n\\"Big, was he!\\" said the old Frog, puffing herself up. \\"Was he as big as this?\\"\\r\\n\\r\\n\\"Oh, much bigger!\\" they cried.\\r\\n\\r\\nThe Frog puffed up still more.\\r\\n\\r\\n\\"He could not have been bigger than this,\\" she said. But the little Frogs all declared that the monster was much, much bigger and the old Frog kept puffing herself out more and more until, all at once, she burst.\\r\\n

I tried this, but this still leaves out the the backslashes.

text.gsub(/\r?\n|\r/, "")

"An Ox came down to a reedy pool to drink. As he splashed heavily into the water, he crushed a young Frog into the mud. The old Frog soon missed the little one and asked his brothers and sisters what had become of him. \\"A great big monster,\\" said one of them, \\"stepped on little brother with one of his huge feet!\\" \\"Big, was he!\\" said the old Frog, puffing herself up. \\"Was he as big as this?\\" \\"Oh, much bigger!\\" they cried. The Frog puffed up still more. \\"He could not have been bigger than this,\\" she said. But the little Frogs all declared that the monster was much, much bigger and the old Frog kept puffing herself out more and more until, all at once, she burst. "

The following expression seems to match the correct pattern at www.rubular.com including the individual backslashes, but does not seem to work in my console (Ruby 2.2.1)

text.gsub(/(\\r\\n)|\\/, "")

Note: For full disclosure, this string of text is captured in an HTML editor and stored into a database column. I have a need to strip out the HTML characters and I use the following -

text = ActionView::Base.full_sanitizer.sanitize(page.content).gsub(/\r?\n|\r\\|\\/, "")

I appreciate any help you can provide!

The most efficient way to perform this operation is with String#delete (or #delete! ):

text.delete!("\r\n\\")
p text
puts
puts text

Output:

"An Ox came down to a reedy pool to drink. As he splashed heavily into the water, he crushed a young Frog into the mud.The old Frog soon missed the little one and asked his brothers and sisters what had become of him.\\"A great big monster,\\" said one of them, \\"stepped on little brother with one of his huge feet!\\"\\"Big, was he!\\" said the old Frog, puffing herself up. \\"Was he as big as this?\\"\\"Oh, much bigger!\\" they cried.The Frog puffed up still more.\\"He could not have been bigger than this,\\" she said. But the little Frogs all declared that the monster was much, much bigger and the old Frog kept puffing herself out more and more until, all at once, she burst."

An Ox came down to a reedy pool to drink. As he splashed heavily into the water, he crushed a young Frog into the mud.The old Frog soon missed the little one and asked his brothers and sisters what had become of him."A great big monster," said one of them, "stepped on little brother with one of his huge feet!""Big, was he!" said the old Frog, puffing herself up. "Was he as big as this?""Oh, much bigger!" they cried.The Frog puffed up still more."He could not have been bigger than this," she said. But the little Frogs all declared that the monster was much, much bigger and the old Frog kept puffing herself out more and more until, all at once, she burst.

Benchmark results:

Warming up --------------------------------------
         String#gsub     2.826k i/100ms
           String#tr    35.794k i/100ms
       String#delete    37.147k i/100ms
Calculating -------------------------------------
         String#gsub     29.801k (± 2.8%) i/s -    149.778k in   5.030044s
           String#tr    399.391k (± 3.3%) i/s -      2.004M in   5.024297s
       String#delete    411.065k (± 4.0%) i/s -      2.080M in   5.068783s

I used /\\R+|\\// for the String#gsub method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM