简体   繁体   中英

How to remove any illegal new lines appears within a text files using vim text editor?

I am trying to repair a data file so I can use MySQL DATA LOAD INFILE TO IMPORT DATA INTO database.

The problem that I am having with the file is that there are lengthy text but it contains a new line with in the text. Also a new line means a new record. This is making it hard for me to import the records into MySQL.

How can I use vim in linux to search for illegal new lines and replace them with a space?

Illegal new line: if a new line is found between a comma ( , ) and ( ,012d000 )

This is a sample data of the file

VST-65654,a0Jd000000FM8cBEAT,Blah,2013-10-22 10:46:30.000000,Blah Blah,2014-01-20 20:27:42.000000,2013-10-18 14:00:00.000000,005d0000002biR4AAI,001d000001NEh0oAAD,In Person,Unscheduled,Grow Applications,High,this is the body

of this 
log test
where I need to

remove all extra new lines,012d0000000ppiXAAQ
VST-122549,a0Jd000000GVwtyEAD,Blah,2013-10-31 18:17:50.000000,Blah,2013-11-06 18:07:47.000000,2013-10-31 18:10:00.000000,005d0000002biR9AAI,001d000001NEaQgAAL,In Person,Scheduled,Grow Applications,Medium,One more long paragraph

where I need to remove all extra

new lines

,012d0000000ppiABCD

The fields are separated by a comma ( , ) and the new record should begin when a new line \\n is found. How can I do such a search replace to fix this issue?

Or how can I replace all unescaped commas with a double quotes? That is, if I find \\, don't touch it, but if you find a comma with replace it with ","

Thanks

g/^VST/,-/,012d000/j!

Use the global command, :g to join together, :j , the line starting with VST with all the lines through the next instance of 012d000 .

For more help see:

:h :g
:h :j
:h [range]

My regex foo isn't powerfull enough to do that in a single command but you could create a macro to achieve what you want. The following worked for the input you gave

Go to start of file

gg

Start recording

qq

Find next ,012d

/,012d<CR>

Go up one line

k

Enter visual mode

v

Go to previous comma

?,<CR>

Replace all new line chars

:s/\n//g<CR>

Go down one line

j

Finish recording

q

Repeat

@q

Result

VST-65654,a0Jd000000FM8cBEAT,Blah,2013-10-22 10:46:30.000000,Blah Blah,2014-01-20 20:27:42.000000,2013-10-18 14:00:00.000000,005d0000002biR4AAI,001d000001NEh0oAAD,In Person,Unscheduled,Grow Applications,High,this is the body of this log test where I need to remove all extra new lines,012d0000000ppiXAAQ
VST-122549,a0Jd000000GVwtyEAD,Blah,2013-10-31 18:17:50.000000,Blah,2013-11-06 18:07:47.000000,2013-10-31 18:10:00.000000,005d0000002biR9AAI,001d000001NEaQgAAL,In Person,Scheduled,Grow Applications,Medium,One more long paragraph where I need to remove all extra new lines ,012d0000000ppiABCD

I like @Peter Rincker's answer. As for the question you asked at the end, you can replace all the un-escaped commas with "," using

:%s/\\\@<!,/","/g

Here, \\\\ represents a literal backslash and \\@<! is a modifier. (See :help /\\@<! .)

The problem with this solution is that you have not correctly defined what an un-escaped comma is. For example, \\\\, is an escaped backslash followed by an un-escaped comma. I believe that /\\\\\\@<!\\%(\\\\\\\\\\)*\\zs,/ is the correct pattern, but I do not say it is pretty. It is a little better if you use the "very magic" version: /\\v\\\\@<!%(\\\\\\\\)*\\zs,/ .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM