Sed / Awk / Cut GNU将文本行转换为单行

Question

I have the following type of data: 我有以下类型的数据：

3869|Jennifer Smith
10413 NE 71st Street
Vancouver, WA
98662
360-944-9578
jsmith@yahoo.com|1234567890123456|03-2013|123
--
3875|Joan L Doe
422 1/2 14th Ave E
Seattle, WA
98112
206-322-7666
jldoe@comcast.net|1234-1234-1234-1234|03-2013|123
--
3862|Dana Doe
24235 NE 7th Pl
Sammamish, WA
98074
425 868-2227
jsmith@hotmail.com|1234567890123456|03-2013|123
--
3890|John Smith
10470 SW 67th Ave
Tigard, OR
97223
5032205213
john.smith@gmail.com|1234567890123456|03-2013|123

I need to transform it to: 我需要将其转换为：

3869|Jennifer Smith|10413 NE 71st Street|Vancouver, WA|98662|360-944-9578|jsmith@yahoo.com|1234567890123456|03-2013|123
3875|Joan L Doe|422 1/2 14th Ave E|Seattle, WA|98112|206-322-7666|jldoe@comcast.net|1234-1234-1234-1234|03-2013|123
3862|Dana Doe|24235 NE 7th Pl|Sammamish, WA|98074|425 868-2227|jsmith@hotmail.com|1234567890123456|03-2013|123
3890|John Smith|10470 SW 67th Ave|Tigard, OR|97223|5032205213|john.smith@gmail.com|1234567890123456|03-2013|123

or better: 或更好：

3869|Jennifer Smith|10413 NE 71st Street|Vancouver|WA|98662|360-944-9578|jsmith@yahoo.com|1234567890123456|03-2013|123
3875|Joan L Doe|422 1/2 14th Ave E|Seattle|WA|98112|206-322-7666|jldoe@comcast.net|1234-1234-1234-1234|03-2013|123
3862|Dana Doe|24235 NE 7th Pl|Sammamish|WA|98074|425 868-2227|jsmith@hotmail.com|1234567890123456|03-2013|123
3890|John Smith|10470 SW 67th Ave|Tigard|OR|97223|5032205213|john.smith@gmail.com|1234567890123456|03-2013|123

any idea how to automate this using GNU sed, awk, cu or perl/python whatever... Thank you! 任何想法如何使用GNU sed，awk，cu或perl / python来自动执行此操作……谢谢！

Answer 1

Using sed 使用sed

sed -n ':a;$!N;/--/!s/\n/|/g;ta;P' inputFile


$ sed -n ':a;$!N;/--/!s/\n/|/g;ta;P' temp 
3869|Jennifer Smith|10413 NE 71st Street|Vancouver, WA|98662|360-944-9578|jsmith@yahoo.com|1234567890123456|03-2013|123
3875|Joan L Doe|422 1/2 14th Ave E|Seattle, WA|98112|206-322-7666|jldoe@comcast.net|1234-1234-1234-1234|03-2013|123
3862|Dana Doe|24235 NE 7th Pl|Sammamish, WA|98074|425 868-2227|jsmith@hotmail.com|1234567890123456|03-2013|123
3890|John Smith|10470 SW 67th Ave|Tigard, OR|97223|5032205213|john.smith@gmail.com|1234567890123456|03-2013|123

Explaination: 说明：

:a Create a label a. :a创建标签
$! If not the last line; 如果不是最后一行； do 做
N get a new line N换行
/--/! if line does not match this regex; 如果行与此正则表达式不匹配； do 做
/s/\\n/|/g substitute new line with pipe /s/\\n/|/g用管道替换新行
ta branch back to label if the substitution was successful 如果替换成功， ta分支回到标签
P print the line. P打印行。

Note: Here is the difference between p , P , n and N . 注意：这是p ， P ， n和N之间的差异。

The n command will print out the current pattern space and read in the next line of input. n命令将打印出当前模式空间并读入下一行输入。
The N command does not print out the current pattern space. N命令不会打印出当前图案空间。 It reads in the next line, but appends a new line character along with the input line itself to the pattern space. 它读取下一行，但是将新行字符以及输入行本身附加到模式空间。
The p command prints the entire pattern space. p命令打印整个图案空间。
The P command only prints the first part of the pattern space, up to the NEWLINE character. P命令仅打印模式空间的第一部分，直到NEWLINE字符为止。

Answer 2

I don't think it is very nice, but it nearly works (missing last line): 我认为这不是很好，但是几乎可以正常工作（缺少最后一行）：

$ awk '{if (/^--/) {print a; a=""} else { a=a"|"$0}}' file
|3869|Jennifer Smith|10413 NE 71st Street|Vancouver, WA|98662|360-944-9578|jsmith@yahoo.com|1234567890123456|03-2013|123
|3875|Joan L Doe|422 1/2 14th Ave E|Seattle, WA|98112|206-322-7666|jldoe@comcast.net|1234-1234-1234-1234|03-2013|123
|3862|Dana Doe|24235 NE 7th Pl|Sammamish, WA|98074|425 868-2227|jsmith@hotmail.com|1234567890123456|03-2013|123

Update 更新资料

If you add an extra 如果添加额外的

--

at the end of your file, it completly works: 在文件末尾，它完全可以正常工作：

$ awk '{if (/^--/) {print a; a=""} else { a=a"|"$0}}' file
|3869|Jennifer Smith|10413 NE 71st Street|Vancouver, WA|98662|360-944-9578|jsmith@yahoo.com|1234567890123456|03-2013|123
|3875|Joan L Doe|422 1/2 14th Ave E|Seattle, WA|98112|206-322-7666|jldoe@comcast.net|1234-1234-1234-1234|03-2013|123
|3862|Dana Doe|24235 NE 7th Pl|Sammamish, WA|98074|425 868-2227|jsmith@hotmail.com|1234567890123456|03-2013|123
|3890|John Smith|10470 SW 67th Ave|Tigard, OR|97223|5032205213|john.smith@gmail.com|1234567890123456|03-2013|123

This happens because my code waits for an -- to print what is buffering. 发生这种情况是因为我的代码等待--打印正在缓冲的内容。

Answer 3

A slightly more idiomatic awk solution: 一个稍微惯用的awk解决方案：

awk -F'\n' -vRS='\n--\n' -vOFS='|' '{$1=$1;print}' test.in

Tell it incoming records are separated by a line consisting of -- and fields are separated by newlines, and outgoing fields should be separated by | 告诉它传入记录由--组成的行分隔，字段由换行符分隔，传出字段应由|分隔| and records should be separated by the standard newline. 记录应以标准换行符分隔。 $1 = $1 forces a reformatting conforming to this. $1 = $1强制重新格式化。

If the file doesn't end with a -- , you will get an extra | 如果文件不以--结尾，您将获得一个额外的| on the end, if you need to avoid this you can change this slightly: 最后，如果您需要避免这种情况，可以稍作更改：

awk -F'\n' -vRS='\n--\n' -vOFS='|' '{if($NF==""){NF--}$1=$1;print}' test.in

Sed / Awk / Cut GNU将文本行转换为单行

问题描述

3 个解决方案

解决方案1
6 2013-06-05 14:03:00

Explaination: 说明：

解决方案2
4 已采纳 2013-06-05 13:48:51

Update 更新资料

解决方案3
4 2013-06-05 14:43:38

Sed / Awk / Cut GNU将文本行转换为单行

问题描述

3 个解决方案

解决方案1 6 2013-06-05 14:03:00

Explaination: 说明：

解决方案2 4 已采纳 2013-06-05 13:48:51

Update 更新资料

解决方案3 4 2013-06-05 14:43:38

解决方案1
6 2013-06-05 14:03:00

解决方案2
4 已采纳 2013-06-05 13:48:51

解决方案3
4 2013-06-05 14:43:38