简体   繁体   中英

Split on different newlines

Right now I'm doing a split on a string and assuming that the newline from the user is \r\n like so:

string.split(/\r\n/)

What I'd like to do is split on either \r\n or just \n .

So how what would the regex be to split on either of those?

Did you try /\r?\n/ ? The ? makes the \r optional.

Example usage: http://rubular.com/r/1ZuihD0YfF

Ruby has the methods String#each_line and String#lines

returns an enum: http://www.ruby-doc.org/core-1.9.3/String.html#method-i-each_line

returns an array: http://www.ruby-doc.org/core-2.1.2/String.html#method-i-lines

I didn't test it against your scenario but I bet it will work better than manually choosing the newline chars.

# Split on \r\n or just \n
string.split( /\r?\n/ )

Although it doesn't help with this question (where you do need a regex), note that String#split does not require a regex argument. Your original code could also have been string.split( "\r\n" ) .

\n is for unix 
\r is for mac 
\r\n is for windows format

To be safe for operating systems. I would do /\r?\n|\r\n?/

"1\r2\n3\r\n4\n\n5\r\r6\r\n\r\n7".split(/\r?\n|\r\n?/)
=> ["1", "2", "3", "4", "", "5", "", "6", "", "7"]

The alternation operator in Ruby Regexp is the same as in standard regular expressions: |

So, the obvious solution would be

/\r\n|\n/

which is the same as

/\r?\n/

ie an optional \r followed by a mandatory \n .

Perhaps do a split on only '\n' and remove the '\r' if it exists?

Are you reading from a file, or from standard in?

If you're reading from a file, and the file is in text mode, rather than binary mode, or you're reading from standard in, you won't have to deal with \r\n - it'll just look like \n .

C:\Documents and Settings\username>irb
irb(main):001:0> gets
foo
=> "foo\n"

Another option is to use String#chomp , which also handles newlines intelligently by itself.

You can accomplish what you are after with something like:

lines = string.lines.map(&:chomp)

Or if you are dealing with something large enough that memory use is a concern:

<string|io>.each_line do |line|
  line.chomp!
  #  do work..
end

Performance isn't always the most important thing when solving this kind of problem, but it is worth noting the chomp solution is also a bit faster than using a regex.

On my machine (i7, ruby 2.1.9):

Warming up --------------------------------------
           map/chomp    14.715k i/100ms
  split custom regex    12.383k i/100ms
Calculating -------------------------------------
           map/chomp    158.590k (± 4.4%) i/s -    794.610k in   5.020908s
  split custom regex    128.722k (± 5.1%) i/s -    643.916k in   5.016150s

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM