简体   繁体   中英

How to get the desire ouput using bash script?

I am trying to get this ouput, i don't know how to get it i search through the internet but i didn't know what will be the exact keyword for searching, so i post it here my question i have a csv file data.csv which it contents are shown below I have tried so far is shown my MWE

cat data.csv|sed 's/\\n.*//g'

10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5, 
line 5 text
10,1,6,"<J>
 line 6 text"
10,1,7,"line 7 text"
10,1,8,"
 line 8 text"
10,1,9,"line 9 text"

I want the ouput as shown below

10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,"line 5 text"
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"

With GNU sed:

sed '/".*"$/!{N;s/\n *//}' file

If a line does not match regex ".*"$ append next line ( N ) to sed's pattern space and replace newline followed by none, one or more white spaces with nothing ( s/\\n *// ).

Output:

10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5, line 5 text
10,1,6,"line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"

I did not add the missing quotation marks in line 5.


See: man sed and The Stack Overflow Regular Expressions FAQ

In addition to Cyrus's answer, to ensure 'line 5 text' is surrounded with double-quotes you can add additional expressions to replace the ', ' with ',"' and lines that do not end in '"' with a '"' , eg

sed -e '/".*"$/!{N;s/\n *//}' -e 's/, /,"/' -e '/"$/!{s/$/"/}' file

The first expression is exactly the same. This would provide your requested output of:

$ sed -e '/".*"$/!{N;s/\n *//}' -e 's/, /,"/' -e '/"$/!{s/$/"/}' file
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,"line 5 text"
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"

With GNU awk for mult-char RS, RT, and gensub() you can just describe each record as a series of 4 comma-separated fields ending in newline and then remove the newlines and spaces around them:

$ awk -v RS='([^,]*,){3}[^,]*\n' '{$0=gensub(/\s*\n\s*/,"","g",RT)} 1' file
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,line 5 text
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"

and to ensure quotes around the last field:

$ awk -v RS='([^,]*,){3}[^,]*\n' '{$0=gensub(/\s*\n\s*/,"","g",RT); $0=gensub(/,([^",]*)$/,",\"\\1\"",1)} 1' file
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,"line 5 text"
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"

Note that this will work no matter how many lines your 4th field is split over:

$ cat file
10,1,1,"line 1 text"
10,1,2,
foo
line
2
text
bar
10,1,3,"line 3 text"

$ awk -v RS='([^,]*,){3}[^,]*\n' '{$0=gensub(/\s*\n\s*/,"","g",RT); $0=gensub(/,([^",]*)$/,",\"\\1\"",1)} 1' file
10,1,1,"line 1 text"
10,1,2,"fooline2textbar"
10,1,3,"line 3 text"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM