[英]How to test if a substring from each line of File1 exists in File2
I have two files with following data我有两个包含以下数据的文件
file1:文件1:
6100540301SD01 ON5330399520191104906781 2019110390678151053303995ACK 20191105
6100540301SD01 ON0403096420191104225695 2019110322569551004030964A 20191105
6005260301SD01 46460045792019110490678911059455 2019110490678951000755694BE3 1120191105
6005260301SD01 46460045792019110490679616020577 2019110490679651000764053BDJDEDH 1620191105
file2:文件2:
20191104
20191105
20191106
Since file1 is fixed width file, the string at character position 97 to 104 is date.由于file1是定宽文件,字符 position 97 到 104 处的字符串是日期。 I want to extract the string by position from 97 to 104 and check if this exists in file2 .我想通过 position 从 97 到 104 提取字符串,并检查它是否存在于file2中。 If it exists, I want to copy whole line to file3 .If its not, I want to copy it to file4 .如果存在,我想将整行复制到file3 。如果不存在,我想将其复制到file4 。
I have created C++ program but it is taking long time to process the file1 while is almost half million records.我已经创建了 C++ 程序,但是处理file1需要很长时间,而几乎有 50 万条记录。 Therefore, if there is any awk/sed
script that can be helpful, please share.因此,如果有任何awk/sed
脚本可以提供帮助,请分享。
Turn the contents of file2
into a regular expression like 20191104|20191106|20191106
.将file2
的内容转换为正则表达式,如20191104|20191106|20191106
。 Then you can use grep
to match it.然后你可以使用grep
来匹配它。
patterns=$(<file2)
# Replace newlines with |
pattern=${patterns//$'\n'/|}
# Put ^.{96} at the beginning so it matches starting at column 97
pattern="^.{96}($pattern)"
grep -E "$pattern" file1 > file3 # Lines that match
grep -v -E "$pattern" file1 > file4 # Lines that don't match
If running grep
twice is too slow, you could use awk
:如果运行grep
两次太慢,您可以使用awk
:
awk -v pat="$pattern" '$0 ~ pat { print >>"file3"; next} {print >>"file4"}'
awk
to the rescue! awk
来救援!
$ awk 'NR==FNR {dates[$0]; next}
{print > (substr($0,97,104) in dates?"file3":"file4")}' file2 file1
This might work for you (GNU sed):这可能对您有用(GNU sed):
sed 's#.*#/^.\\{96\\}&/ba#' file2 | sed -nf - -e 'w file4' -e 'b;:a;w file3' file1
Create a script from file2 which writes each match to file3 and any remaining lines to file4.从 file2 创建一个脚本,将每个匹配项写入 file3 并将任何剩余的行写入 file4。
The first invocation of sed passes its output to the second invocation of sed which in turn is supplemented with a couple of strings of commands inline. sed 的第一次调用将其 output 传递给 sed 的第二次调用,这反过来又补充了一对内联命令字符串。 All matches are sent to the loop holder :a
which writes them out to file3 any that are not matched, fall through to be written to file4.所有匹配都被发送到循环持有者:a
,它将它们写出到 file3 任何不匹配的,落到被写入到 file4 中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.