简体   繁体   中英

Remove empty lines from a string in ruby

I've gone through other similar questions and they dont seem to explain my problem.

My output ,right now is like this, I would like to remove empty lines from the string in ruby,

#    

CIRRUS LADIES NIGHT with DJ ROHIT

4th of JULY Party ft. DJ JASMEET @ I-Bar

Submerge Deep @ Pebble | Brute Force (Tuhin Mehta) | DJ Arpan (Opening)

Champagne Showers - DJs Panic & Nyth @ Blue Waves

THURSDAY PAST AND PRESENT @ Hint

and I want my output to be like this,

CIRRUS LADIES NIGHT with DJ ROHIT
4th of JULY Party ft. DJ JASMEET @ I-Bar
Submerge Deep @ Pebble | Brute Force (Tuhin Mehta) | DJ Arpan (Opening)
Champagne Showers - DJs Panic & Nyth @ Blue Waves
THURSDAY PAST AND PRESENT @ Hint

I've tried gsub /^$\\n/,'' , gsub(/\\n/,'') , squeeze("\\n") and delete! "\\n" delete! "\\n" to no avail.

Also,I forgot to mention that my string starts with a blank line, the # denotes a blank line before the first line,if that would change anything.

My String.inspect as requested,the content of the string has changed,though the issue is still the same.

string.inspect :

"\n\n\t\t\t\t\t\t\t\t\t"
"Tricky Tuesdays with DJ John @ Blend"
"\n\n\t\t\t\t\t\t\t\t\t"
"Bladder Buster Challenge with DJ Sean @ Star Rock"
"\n\n\t\t\t\t\t\t\t\t\t"
"Classic Rock Tuesday @ 10D - Chennai"
"\n\n\t\t\t\t\t\t\t\t\t"
"Vodka Night with DJ John @ Blend"
"\n\n\t\t\t\t\t\t\t\t\t"
"\"BOLLYWOOD WEDNESDAYS\" with DJ D Nash @ Candy Club"
"\n\n\t\t\t\t\t\t\t\t\t"
"RE - LAUNCH WEDNESDAY LADIES NIGHT @ ZODIAC"
"\n\n\t\t\t\t\t\t\t\t\t"
"Ladies Night @ 10 D - Chennai"
"\n\n\t\t\t\t\t\t\t\t\t"
"Wednesday Mayhem @ Dublin"
"\n\n\t\t\t\t\t\t\t\t\t"

这是我的解决方案:

text.gsub(/\n+|\r+/, "\n").squeeze("\n").strip

This removes all consecutive empty lines:

result = s.squeeze("\r\n").gsub(/(\r\n)+/, "\r\n")

or a commandline option without Ruby:

grep -v "^$" <file>

First of all, your code removes all newlines, not just the blank ones - that doesn't sound like what you want.

Second, THE operating systems have historically disagreed on how to represent newlines - old Macs used \\r for new lines, Linux and OSX use \\n , and Windows uses the combo \\r\\n . So you really want to replace consecutive \\r 's and \\n s (indicating a blank line in there) with a single \\n .

.split(/\\n/).reject{ |l| l.chomp.empty? }.join("\\n")

for Unix style only:

.split(/\\n/).reject(&:empty?).join("\\n")

removes whitespace lines too (Unix, Rails method):

.split(/\\n/).reject(&:blank?).join("\\n")

Here's a single regex that removes all blank lines, including those at the start or end of the file, including lines that contain only spaces or tabs, and allowing for all three forms of line ending markers ( \\r\\n , \\n , and \\r ):

def remove_blank_lines( str, line_ending="\n" )
  str.gsub(/(?<=\A|#{line_ending})[ \t]*(?:#{line_ending}|\z)/,'')
end

Tested:

[ "\r\n", "\n", "\r" ].each do |marker|
    puts '='*70, "Lines ending with: #{marker.inspect}", '='*70
  [ "", " ", "\t", " \t", "\t " ].each do |whitespace|
    0.upto(2) do |lines|
        blank_lines = "#{whitespace}#{marker*lines}"
      s = "#{marker*lines}a#{marker*lines}b#{blank_lines}c#{blank_lines}"
      tight = remove_blank_lines(s, marker)
      puts "%43s -> %s" % [s.inspect, tight.inspect]
    end
  end
end

#=> ======================================================================
#=> Lines ending with: "\r\n"
#=> ======================================================================
#=>                                       "abc" -> "abc"
#=>                       "\r\na\r\nb\r\nc\r\n" -> "a\r\nb\r\nc\r\n"
#=>       "\r\n\r\na\r\n\r\nb\r\n\r\nc\r\n\r\n" -> "a\r\nb\r\nc\r\n"
#=>                                     "ab c " -> "ab c "
#=>                     "\r\na\r\nb \r\nc \r\n" -> "a\r\nb \r\nc \r\n"
#=>     "\r\n\r\na\r\n\r\nb \r\n\r\nc \r\n\r\n" -> "a\r\nb \r\nc \r\n"
#=>                                   "ab\tc\t" -> "ab\tc\t"
#=>                   "\r\na\r\nb\t\r\nc\t\r\n" -> "a\r\nb\t\r\nc\t\r\n"
#=>   "\r\n\r\na\r\n\r\nb\t\r\n\r\nc\t\r\n\r\n" -> "a\r\nb\t\r\nc\t\r\n"
#=>                                 "ab \tc \t" -> "ab \tc \t"
#=>                 "\r\na\r\nb \t\r\nc \t\r\n" -> "a\r\nb \t\r\nc \t\r\n"
#=> "\r\n\r\na\r\n\r\nb \t\r\n\r\nc \t\r\n\r\n" -> "a\r\nb \t\r\nc \t\r\n"
#=>                                 "ab\t c\t " -> "ab\t c\t "
#=>                 "\r\na\r\nb\t \r\nc\t \r\n" -> "a\r\nb\t \r\nc\t \r\n"
#=> "\r\n\r\na\r\n\r\nb\t \r\n\r\nc\t \r\n\r\n" -> "a\r\nb\t \r\nc\t \r\n"
#=> ======================================================================
#=> Lines ending with: "\n"
#=> ======================================================================
#=>                                       "abc" -> "abc"
#=>                               "\na\nb\nc\n" -> "a\nb\nc\n"
#=>                       "\n\na\n\nb\n\nc\n\n" -> "a\nb\nc\n"
#=>                                     "ab c " -> "ab c "
#=>                             "\na\nb \nc \n" -> "a\nb \nc \n"
#=>                     "\n\na\n\nb \n\nc \n\n" -> "a\nb \nc \n"
#=>                                   "ab\tc\t" -> "ab\tc\t"
#=>                           "\na\nb\t\nc\t\n" -> "a\nb\t\nc\t\n"
#=>                   "\n\na\n\nb\t\n\nc\t\n\n" -> "a\nb\t\nc\t\n"
#=>                                 "ab \tc \t" -> "ab \tc \t"
#=>                         "\na\nb \t\nc \t\n" -> "a\nb \t\nc \t\n"
#=>                 "\n\na\n\nb \t\n\nc \t\n\n" -> "a\nb \t\nc \t\n"
#=>                                 "ab\t c\t " -> "ab\t c\t "
#=>                         "\na\nb\t \nc\t \n" -> "a\nb\t \nc\t \n"
#=>                 "\n\na\n\nb\t \n\nc\t \n\n" -> "a\nb\t \nc\t \n"
#=> ======================================================================
#=> Lines ending with: "\r"
#=> ======================================================================
#=>                                       "abc" -> "abc"
#=>                               "\ra\rb\rc\r" -> "a\rb\rc\r"
#=>                       "\r\ra\r\rb\r\rc\r\r" -> "a\rb\rc\r"
#=>                                     "ab c " -> "ab c "
#=>                             "\ra\rb \rc \r" -> "a\rb \rc \r"
#=>                     "\r\ra\r\rb \r\rc \r\r" -> "a\rb \rc \r"
#=>                                   "ab\tc\t" -> "ab\tc\t"
#=>                           "\ra\rb\t\rc\t\r" -> "a\rb\t\rc\t\r"
#=>                   "\r\ra\r\rb\t\r\rc\t\r\r" -> "a\rb\t\rc\t\r"
#=>                                 "ab \tc \t" -> "ab \tc \t"
#=>                         "\ra\rb \t\rc \t\r" -> "a\rb \t\rc \t\r"
#=>                 "\r\ra\r\rb \t\r\rc \t\r\r" -> "a\rb \t\rc \t\r"
#=>                                 "ab\t c\t " -> "ab\t c\t "
#=>                         "\ra\rb\t \rc\t \r" -> "a\rb\t \rc\t \r"
#=>                 "\r\ra\r\rb\t \r\rc\t \r\r" -> "a\rb\t \rc\t \r"

Try

/^\n/

and replace with the empty string.

are you sure your newline character is only \\n ? If not try

/^\r?\n/

to allow also the linebreak sequence \\r\\n .

Here's an ugly hack based on @Tom's answer:

result = s.squeeze("\r\n").tap{ |s2| :go while s2.gsub!("\r\n\r\n","\r\n") }

It supports DOS ( \\r\\n ), Unix ( \\n ), and MacOS 9- ( \\r ) line breaks. Tested:

[ "\r\n", "\n", "\r" ].each do |marker|
  1.upto(5) do |lines|
    s = "a#{marker*lines}b"
    tight = s.squeeze("\r\n").tap{ |s2| :go while s2.gsub!("\r\n\r\n","\r\n") }
    puts "%24s -> %s" % [s.inspect, tight.inspect]
  end
end
#=>                 "a\r\nb" -> "a\r\nb"
#=>             "a\r\n\r\nb" -> "a\r\nb"
#=>         "a\r\n\r\n\r\nb" -> "a\r\nb"
#=>     "a\r\n\r\n\r\n\r\nb" -> "a\r\nb"
#=> "a\r\n\r\n\r\n\r\n\r\nb" -> "a\r\nb"
#=>                   "a\nb" -> "a\nb"
#=>                 "a\n\nb" -> "a\nb"
#=>               "a\n\n\nb" -> "a\nb"
#=>             "a\n\n\n\nb" -> "a\nb"
#=>           "a\n\n\n\n\nb" -> "a\nb"
#=>                   "a\rb" -> "a\rb"
#=>                 "a\r\rb" -> "a\rb"
#=>               "a\r\r\rb" -> "a\rb"
#=>             "a\r\r\r\rb" -> "a\rb"
#=>           "a\r\r\r\r\rb" -> "a\rb"

Note that this assumes that your blank lines are truly blank, and do not have any whitespace on them. If this is the case, you could do a pre pass of s.gsub(/^[ \\t]+$/,'')

This will do it: .gsub(/(\\n\\s*\\n)+/, "\\n")

and replace \\n in the regex with [\\n|\\r ] if needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM