简体   繁体   English

比较2个文件并删除file2中与file1中找到的值匹配的任何行

[英]Compare 2 files and remove any lines in file2 when they match values found in file1

I have two files. 我有两个文件。 i am trying to remove any lines in file2 when they match values found in file1. 我试图删除文件2中的任何行,当它们匹配file1中找到的值。 One file has a listing like so: 一个文件有这样的列表:

File1 文件1

ZNI008
ZNI009
ZNI010
ZNI011
ZNI012

... over 19463 lines ...超过19463行

The second file includes lines that match the items listed in first: File2 第二个文件包含与第一个文件中列出的项匹配的行:File2

copy /Y \\server\foldername\version\20050001_ZNI008_162635.xml \\server\foldername\version\folder\
copy /Y \\server\foldername\version\20050001_ZNI010_162635.xml \\server\foldername\version\folder\
copy /Y \\server\foldername\version\20050001_ZNI012_162635.xml \\server\foldername\version\folder\
copy /Y \\server\foldername\version\20050001_ZNI009_162635.xml \\server\foldername\version\folder\

... continues listing until line 51360 ...继续列出直到第51360行

What I've tried so far: 到目前为止我尝试过的:

grep -v -i -f file1.txt file2.txt > f3.txt

does not produce any output to f3.txt or remove any lines. 不会产生任何输出到f3.txt或删除任何行。 I verified by running 我通过跑步验证

wc -l file2.txt

and the result is 结果是

51360 file2.txt

I believe the reason is that there are no exact matches. 我相信原因是没有确切的匹配。 When I run the following it shows nothing 当我运行以下内容时,它什么也没显示

comm -1 -2 file1.txt file2.txt

Running 运行

( tr '\0' '\n' < file1.txt; tr '\0' '\n' < file2.txt ) | sort | uniq -c | egrep -v '^ +1'

shows only one match, even though I can clearly see there is more than one match. 只显示一场比赛,即使我可以清楚地看到有多场比赛。

Alternatively putting all the data into one file and running the following: 或者将所有数据放入一个文件并运行以下命令:

grep -Ev "$(cat file1.txt)" 1>LinesRemoved.log

says argument has too many lines to process. 说论证有太多的线要处理。

I need to remove lines matching the items in file1 from file2. 我需要从file2中删除与file1中的项匹配的行。

i am also trying this in python: ` 我也在python中尝试这个:`

    #!/usr/bin/python
s = set()

# load each line of file1 into memory as elements of a set, 's'
f1 = open("file1.txt", "r")
for line in f1:
    s.add(line.strip())
f1.close()

# open file2 and split each line on "_" separator,
# second field contains the value ZNIxxx
f2 = open("file2.txt", "r")
for line in f2:
    if line[0:4] == "copy":
        fields = line.split("_")
        # check if the field exists in the set 's'
        if fields[1] not in s:
            match = line
        else:
            match = 0
    else:
        if match:
            print match, line,

` `

it is not working well.. as im getting 'Traceback (most recent call last): File "./test.py", line 14, in ? 它运行不正常..因为我正在获取'Traceback(最近的呼叫最后一次):文件“./test.py”,第14行,在? if fields[1] not in s: IndexError: list index out of range' 如果字段[1]不在s中:IndexError:列表索引超出范围'

关于什么:

grep -F -v -f file1 file2 > file3

我更喜欢byrondrossos的grep解决方案,但这是另一种选择:

sed $(awk '{printf("-e /%s/d ", $1)}' file1) file2 > file3

this is using Bash and GNU sed because of the -i switch 这是因为-i开关使用BashGNU sed

cp file2 file3
while read -r; do
    sed -i "/$REPLY/d" file3
done < file1

there is surely a better way but here's a hack around -i :D 肯定有一个更好的方法,但这是一个黑客围绕-i :D

cp file2 file3
while read -r; do
    (rm file3; sed "/$REPLY/d" > file3) < file3
done < file1

this exploits shell evaluation order 这会利用shell评估顺序


alright, I guess the correct way with this idea is using ed . 好吧,我想这个想法的正确方法是使用ed This should be POSIX too. 这也应该是POSIX。

cp file2 file3
while read -r line; do
    ed file3 <<EOF
/$line/d
wq
EOF
done < file1

in any case, grep seems to do be the right tool for the job. 在任何情况下, grep似乎都是适合这项工作的工具。
@byrondrossos answer should work for you well ;) @byrondrossos答案应该对你有用;)

This is admittedly ugly but it does work. 这无疑是丑陋的,但确实有效。 However, the path must be the same for all of the (except of course the ZNI### portion). 但是,所有的路径必须相同(当然除了ZNI ###部分)。 All but the ZNI### of the path is removed so the command grep -vf can run correctly on the sorted files. 除了路径的ZNI ###之外的所有内容都被删除,因此命令grep -vf可以在已排序的文件上正确运行。

First Convert "testfile2" to "testfileconverted" to just show the ZNI### 首先将“testfile2”转换为“testfileconverted”以显示ZNI ###

cat /testfile2 | sed 's:^.*_ZNI:ZNI:g' | sed 's:_.*::g' > /testfileconverted

Second use inverse grep of the converted file compared to the "testfile1" and add the reformatted output to "testfile3" 第二次使用转换后的文件的反grep与“testfile1”相比较,并将重新格式化的输出添加到“testfile3”

bash -c 'grep -vf <(sort /testfileconverted) <(sort /testfile1)' | sed "s:^:\copy /Y \\\|server\\\foldername\\\version\\\20050001_:g" | sed "s:$:_162635\.xml \\\|server\\\foldername\\\version\\\folder\\\:g" | sed "s:|:\\\:g" > /testfile3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较2个文件,并在匹配后将文件1的值附加到文件2的末尾 - compare 2 files and append a value from file1 to end of file2 after match 测试file1中的行是否是file2中的行的子集 - Test if the lines in file1 are a subset of the lines in file2 如何仅使用file1中的索引从file2获取值(行)? - How do I obtain values(lines) from file2 using only indices in file1? 将N行从File1复制到File2,然后删除File1中的复制行 - Copy N lines from File1 to File2, then delete copied lines in File1 使用file1中的数据更新file2中的记录 - update records in file2 with data found in file1 如果file1的第一列与file2中的任何字符串匹配,则将其替换为file1的第二列 - If the first column of file1 matches any string in file2, then replace it with the second column of file1 如何用python中的file2中的行替换file1中的指定行 - How to replace specified lines from file1 with lines from file2 in python 获取文件1的相对路径(相对于文件2的路径,文件1在文件2的子文件夹中) - Get relative path of file1 (relative to path of file2, file1 is in subfolder of file2) 使用 filecmp.cmp(file1, file2) 将文件与文件列表进行比较 - Comparing file with a list of files using filecmp.cmp(file1, file2) 使用 sed 或任何其他命令将部分行从 file1 复制到 file2 中的特定位置 - copy a part of a line from file1 to specific place in file2 using sed or any other command
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM