简体   繁体   English

如何将匹配两个模式的行附加到文件中的前一行?

[英]How to append lines that match two patterns to the previous line in a file?

I have a csv file where what's supposed to be a single line, is split across several.我有一个 csv 文件,其中应该是一行,但被分成了几行。 I need help to find a way to join the lines that are split.我需要帮助来找到一种方法来加入被分割的行。 Also, the number of fields (separated by ,) is not fixed.此外,字段的数量(由 , 分隔)不是固定的。

A correct line has the following pattern:正确的行具有以下模式:

X,X,X,"() ",Y,H where X can be any number of fields. X,X,X,"() ",Y,H其中 X 可以是任意数量的字段。 However, the bold part (end of the string) is fixed.但是,粗体部分(字符串的结尾)是固定的。 Y and H are both one word. Y 和 H 都是一个词。

The issue is that this line can appear as (or any variant of this):问题是这条线可以显示为(或任何变体):

X,X, X,X,

X, "()" X, ”()”

,Y,H ,Y,H

What I need is a way (awk, sed) of appending the lines that don't have 24 or more commas and do not end with ",Y,H, to the previous line.我需要的是一种方法(awk,sed)将没有 24 个或更多逗号且不以“,Y,H,”结尾的行附加到上一行。

Please bear in mind that it's a large file, although I have 256 GB of RAM.请记住,这是一个大文件,尽管我有 256 GB 的 RAM。

Example例子

  • Correct lines正确的线条

a, b, c, "()", h, k a, b, c, "()", h, k

a, b, c, d, "()", h, k a, b, c, d, "()", h, k

  • Same lines in the file文件中的相同行

First line第一行

a, b, c,一,乙,丙,

"()", h, k "()", h, k

Second line第二行

a, b, c, d, "()" A B C D, ”()”

, h , H

, k , k

So far I've tried this (not working):到目前为止,我已经尝试过这个(不工作):

awk '/"[:space:]*,[:space:]*[:alpha:]+[:space:]*,[:space:]*[:alpha:]+$/{print}' check.csv awk '/"[:space:]*,[:space:]*[:alpha:]+[:space:]*,[:space:]*[:alpha:]+$/{print}' 检查。 CSV

to try to find the lines ending with ", X, Y where X and Y are words.尝试找到以 ", X, Y 结尾的行,其中 X 和 Y 是单词。

Also, as the minimum number of correct fields is 24, I've used:此外,由于正确字段的最小数量为 24,我使用过:

awk 'NF<24{print}' check.csv awk 'NF<24{print}' check.csv

to filter out lines with less than 24 fields.过滤掉少于 24 个字段的行。

My idea is to detect lines that match both regular expressions and append them to the previous line.我的想法是检测与两个正则表达式匹配的行并将它们附加到上一行。

Thank you!谢谢!

This might work for you (GNU sed):这可能对您有用(GNU sed):

sed '/"()", *[^,]\+, *[^,]\+$/b;:a;N;s/\n//;/"()", *[^,]\+, *[^,]\+$/!ba;P;D' file

Do not process a correct line, just bail out.不要处理正确的线路,只是退出。

Otherwise append the next line, remove the introduced newline and try and match again.否则追加下一行,删除引入的换行符并再次尝试匹配。

Repeat until a match, then print/delete the first line and repeat.重复直到匹配,然后打印/删除第一行并重复。

perl -lanF, -e 'push @L, grep length, @F; if ($L[-3] eq q/"()"/) { print join ",", @L; @L=() }' file

  • use -l -n -e to loop over input lines w/o printing, append linebreaks to output使用-l -n -e在不打印的情况下循环输入行,将换行符附加到输出
  • use -a -F, to create @F array by splitting input on commas使用-a -F,通过在逗号上拆分输入来创建@F数组
  • push @L, grep length, @F push nonempty fields onto @L push @L, grep length, @F将非空字段推送到@L
  • if ($L[-3] eq q/"()"/) - if the 3rd to last accumulated field is the magic marker: if ($L[-3] eq q/"()"/) - 如果倒数第三个累积字段是魔术标记:
    • print join ",", @L print all of @L joined with commas print join ",", @L print all of @L join with commas
    • @L=() reset @L @L=()重置@L

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Bash在多行中匹配字符串并追加到前一行 - Bash match string in multiple lines and append to previous line 如何使用SED在文件的两个连续行中搜索两个不同的模式,并在模式匹配后打印下4行? - How can I search for two different patterns in two consecutive lines in a file using SED and print next 4 lines after pattern match? 查找与位于不同行中的模式匹配的文件 - Find file that match patterns located in different lines 如何在linux中的文件中加入具有不同模式的两行? - How can I join two lines with different patterns in file in linux? awk 与 AND 条件在不匹配的行之间的两个模式未命中 - awk with AND condition on two patterns misses in between lines which don not match 匹配两个文件中的字符串,并在第一个文件中向第二个文件的行尾添加匹配字符串 - Match strings from two files and append line with matching string from first file to end of line of second file awk 匹配来自两个文件的三列并将匹配的行附加到新文件 - awk match three columns from two files and append matching lines to a new file 在前一行中将n行最多追加到某个字符 - Append n lines up to a certain character in the previous line 根据模式匹配行并重新格式化文件 Bash/ Linux - Match lines based on patterns and reformat file Bash/ Linux 在两个连续行上搜索两个模式并在匹配前打印 n 行 - Search two patterns on two consecutive lines and print n lines before the match
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM