简体   繁体   English

如何测试 File2 中是否存在来自 File1 每一行的 substring

[英]How to test if a substring from each line of File1 exists in File2

I have two files with following data我有两个包含以下数据的文件

file1:文件1:

6100540301SD01        ON5330399520191104906781            2019110390678151053303995ACK          20191105
6100540301SD01        ON0403096420191104225695            2019110322569551004030964A            20191105
6005260301SD01        46460045792019110490678911059455    2019110490678951000755694BE3        1120191105
6005260301SD01        46460045792019110490679616020577    2019110490679651000764053BDJDEDH    1620191105

file2:文件2:

20191104
20191105
20191106

Since file1 is fixed width file, the string at character position 97 to 104 is date.由于file1是定宽文件,字符 position 97 到 104 处的字符串是日期。 I want to extract the string by position from 97 to 104 and check if this exists in file2 .我想通过 position 从 97 到 104 提取字符串,并检查它是否存在于file2中。 If it exists, I want to copy whole line to file3 .If its not, I want to copy it to file4 .如果存在,我想将整行复制到file3 。如果不存在,我想将其复制到file4

I have created C++ program but it is taking long time to process the file1 while is almost half million records.我已经创建了 C++ 程序,但是处理file1需要很长时间,而几乎有 50 万条记录。 Therefore, if there is any awk/sed script that can be helpful, please share.因此,如果有任何awk/sed脚本可以提供帮助,请分享。

Turn the contents of file2 into a regular expression like 20191104|20191106|20191106 .file2的内容转换为正则表达式,如20191104|20191106|20191106 Then you can use grep to match it.然后你可以使用grep来匹配它。

patterns=$(<file2)
# Replace newlines with |
pattern=${patterns//$'\n'/|}
# Put ^.{96} at the beginning so it matches starting at column 97
pattern="^.{96}($pattern)"
grep -E "$pattern" file1 > file3 # Lines that match
grep -v -E "$pattern" file1 > file4 # Lines that don't match

If running grep twice is too slow, you could use awk :如果运行grep两次太慢,您可以使用awk

awk -v pat="$pattern" '$0 ~ pat { print >>"file3"; next} {print >>"file4"}'

awk to the rescue! awk来救援!

$ awk 'NR==FNR {dates[$0]; next} 
               {print > (substr($0,97,104) in dates?"file3":"file4")}' file2 file1

This might work for you (GNU sed):这可能对您有用(GNU sed):

sed 's#.*#/^.\\{96\\}&/ba#' file2 | sed -nf - -e 'w file4' -e 'b;:a;w file3' file1

Create a script from file2 which writes each match to file3 and any remaining lines to file4.从 file2 创建一个脚本,将每个匹配项写入 file3 并将任何剩余的行写入 file4。

The first invocation of sed passes its output to the second invocation of sed which in turn is supplemented with a couple of strings of commands inline. sed 的第一次调用将其 output 传递给 sed 的第二次调用,这反过来又补充了一对内联命令字符串。 All matches are sent to the loop holder :a which writes them out to file3 any that are not matched, fall through to be written to file4.所有匹配都被发送到循环持有者:a ,它将它们写出到 file3 任何不匹配的,落到被写入到 file4 中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果在file2中找不到file1的输出行 - Output line from file1 if not found in file2 如何遍历两个文件并逐行查找file1中匹配file2的所有匹配项,然后替换为file3中的内容 - How to iterate over two files and find all occurrences in file1 matching file2, line by line, then replace with content from file3 如何将file1的每一列附加到file2的特定字段并创建一个新的输出文件? - How to append each column of file1 to a specific field of file2 and make a new output file? 使用 sed 或任何其他命令将部分行从 file1 复制到 file2 中的特定位置 - copy a part of a line from file1 to specific place in file2 using sed or any other command 如何基于文件/ file1(仅)第一列与linux中的file2的匹配信息从file1提取行? - how to extract rows from file1 based on matching information of its/file1 (only)first column with file2 in linux? 从File2中提取行已找到File1 - Extract lines from File2 already found File1 Perl 使用 file2 从 file1 中删除单词 - Perl removing words from file1 with file2 将带有一列的 file1 与来自 file2 的两列进行比较 - Compare file1 with one column to two columns from file2 将文件 1 中的数据追加/补充到文件 2 (linux) - Append/supplement data from file1 to file2 (linux) 如何根据与file2的列匹配删除file1中的行 - How to delete lines in file1 based on column match with file2
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM