简体   繁体   English

使用awk(或sed)根据下一行的第一个字符删除换行符

[英]Using awk (or sed) to remove newlines based on first character of next line

here's my situation: I had a big text file that I wanted to pull certain information from. 这是我的情况:我有一个大文本文件,我想从中提取某些信息。 I used sed to pull all the relevant information based on regexp's, but each "piece" of information I pulled is on a separate line, I'd like for each "record" to be on its own line so it can be easily imported into a DB. 我使用sed根据regexp提取所有相关信息,但我提取的每一条“信息”都在一个单独的行上,我希望每个“记录”都在它自己的行上,这样它就可以很容易地导入到一个DB。
Here's a sample of my data right now: 这是我现在的数据样本:

92831,499,000 92831,499,000
,0644321 ,0644321
79217,999,000 79217,999,000
,5417178 ,5417178
,PK91622 ,PK91622
,PK90755 ,PK90755

Ideally, I would want this output to look like: 理想情况下,我希望此输出看起来像:

92831,499,000 ,0644321 92831,499,000,0644321
79217,999,000 ,5417178 ,PK91622 79217,999,000,5417178,PK91622
79217,999,000 ,5417178 ,PK90755 79217,999,000,5417178,PK90755

This may be harder to do, so I would settle for the output of that last "record" to only appear once with the additional "PK..." to be the 4th "field" of that line. 这可能更难做,所以我会满足于最后一个“记录”的输出只出现一次,附加的“PK ......”成为该行的第4个“字段”。
In the end, the simplest way I could think of doing is if the line starts with a comma ( ^, ) the newline before it should be removed... I'm not too familiar with awk though so if you could give me a start on this it would really be appreciated! 最后,我能想到的最简单的方法是,如果该行以逗号(^,)开头,那么新行应该删除之前......我不太熟悉awk但是如果你能给我一个从这开始它真的很感激! Thanks! 谢谢!

$ perl -0pe 's/\n,/,/g' < test.dat
92831,499,000,0644321
79217,999,000,5417178,PK91622,PK90755

Translation: Read in bulk without line separation, swap out each comma following a newline with just a comma. 翻译:在没有行分隔的情况下批量阅读,只用逗号换掉换行后的每个逗号。

Shortest code here! 这里最短的代码!

Well, guess I should have taken a closer look at using Records in awk when I was trying to figure this out last night... 10 minutes after looking at them I got it working. 嗯,我想我应该仔细看看在awk中使用Records时我昨晚想弄明白这一点......看了之后10分钟我就开始工作了。 For anyone interested here's how I did this: In my original sed script I put an extra newline infront of the beginning of each record so there's now a blank line seperating each one. 对于任何对此感兴趣的人我是如何做到这一点的:在我原来的sed脚本中,我在每个记录的开头前面添加了一个额外的换行符,所以现在每个记录都有一个空行。 I then use the following awk command: 然后我使用以下awk命令:

awk 'BEGIN {RS = ""; awk'BEGIN {RS =“”; FS = "\\n"} FS =“\\ n”}
{ {
if (NF >= 3) if(NF> = 3)
for (i = 3; i <= NF; i++) for(i = 3; i <= NF; i ++)
print $1,$2,$i 打印$ 1,$ 2,$ i
}' }”

and it works like a charm outputting exactly the way I wanted! 它就像一个魅力输出完全按照我想要的方式!

sedsed -d -n ':t;/^,/!x;H;n;/^,/{x;$!bt;x;H};x;s/\n//g;p;${x;/^,/!p}' filename

Without special-casing field 3, easy. 没有特殊的套管领域3,简单。

awk '
    !/^,/   { if (NR > 1) print x ; x = $0 }
    /^,/    { x = x OFS $0 }
    END     { if (NR) print x }
'

With, more complex but still not too hard. 随着,更复杂但仍然不太难。

awk '
    !/^,/   { if (n && n < 3) print x ; x = $0 ; n = 1 }
    /^,/    { if (++n > 2) { print x, $0 } else { x = x OFS $0 } }
    END     { if (n && n < 3) print x }
'

This might work for you: 这可能对你有用:

# sed ':a;N;s/\n,/,/;ta;P;D' test.dat | sed 's/,/\n/5;s/\(.*,\).*\n/&\1/'
92831,499,000,0644321
79217,999,000,5417178,PK91622
79217,999,000,5417178,PK90755

Explanation: 说明:

This comes in two parts: 这分为两部分:

Append the next line and then if the appended line begins with a , , delete the embedded new line \\n and start again. 附加下一行,然后,如果附加的行以开头,删除嵌入的新行\\n再次启动。 If not print upto the newline and then delete upto the new line. 如果没有打印到换行符然后删除到新行。 Repeat. 重复。

Replace the 5th , with a new line. 更换5 ,用一个新行。 Then insert the first four fields inbetween the embedded newline and the sixth field. 然后在嵌入的换行符和第六个字段之间插入前四个字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM