简体   繁体   English

删除文件中与其他文件中任何行匹配的所有行的 Grep 方法?

[英]Grep approach to remove all lines in file that match any line in other file?

I have a file of camera information where each line has a unique ID of the format我有一个相机信息文件,其中每一行都有一个唯一的格式 ID

{"_id":{"$oid":"5b0cfa5845bb0c0004277e13"},"geometry":{"coordinates":[139.751,35.685]},"addEditBy":["dd53cbd9c5306b1baa103335c4b3e91d8b73386ba29124ea2b1d47a619c8c066877843cd8a7745ce31021a8d1548cf2a"],"legacy_cameraID":1,"type":"ip","source":"google","country":"JP","city":"Tokyo","is_active_image":false,"is_active_video":false,"utc_offset":32400,"timezone_id":"Japan Standard Time","timezone_name":"Japan Standard Time","reference_url":"101.110.193.152/","retrieval":{"ip":"101.110.193.152","port":"80","video_path":"/"},"__v":0}

I also have a list of camera IDs that I want to remove from the original file in the format:我还有一个要从原始文件中删除的相机 ID 列表,格式如下:

5b182800751c3b00044514a9
5b1976b473569e00045dba59
5b197b1273569e00045ddf0f
5b1970cc73569e00045d94fc

How can I use grep or some other command line utility to remove all lines in the input file that have an ID listed in the second file?如何使用 grep 或其他一些命令行实用程序删除输入文件中具有在第二个文件中列出的 ID 的所有行?

Let's say that you have a file called ids.txt that has all of the camera id's that need to be excluded from your data file, which we'll call data.json .假设您有一个名为ids.txt文件,其中包含需要从您的数据文件中排除的所有相机 ID,我们将其称为data.json We can use the -f option of grep (match from a file) and the -v option (only output non-matching lines) as follows:我们可以使用 grep 的-f选项(从文件匹配)和-v选项(仅输出不匹配的行),如下所示:

grep -f ids.txt -v data.json 

grep will only output lines of data.json that do not match any lines in ids.txt . grep只会输出与ids.txt中的任何行都不匹配的data.json行。

You should use json aware tool.您应该使用 json 感知工具。 Here is a GNU awk script that uses json extension:这是一个使用 json 扩展名的 GNU awk 脚本:

$ gawk '                                     # GNU awk
@load "json"                                 # load extension
NR==FNR {                                    # read oids to a hash
    oid[$0]
    next
}
{                                            # process json
    lines=lines $0                           # support multiline json form
    if(json_fromJSON(lines,data)!=0) {       # once json is complete
        if(!(data["_id"]["$oid"] in oid))    # test if oid in exclude list
            print                            # output if not
        lines=""                             # rinse for repeat
    }
}' oids json

A simple thing you can do is get ids from camera info and check if they are listed in the second file.您可以做的一件简单的事情是从相机信息中获取 id,并检查它们是否列在第二个文件中。

For example:例如:

#!/bin/bash
exec 3<info.txt
while IFS= read -r line <&3; do
  id="$(printf '%s' "${line}" | jq '._id."$oid"' | sed -e 's/"//g')"
  if ! grep -e "${id}" list.txt >/dev/null; then
    printf '%s\n' "${line}"
  fi
done >clean.txt
exec 3>&-

Where:在哪里:

  1. info.txt is the file with camera information info.txt是包含相机信息的文件
  2. list.txt is the list of ids you do not want list.txt是您不想要的 id 列表

Note that this is not the only way you can achieve it, I used a simple cycle just as poc.请注意,这不是您实现它的唯一方法,我使用了一个简单的循环,就像 poc。

You can achieve it using directly jq, for example:可以直接使用jq来实现,例如:

#!/bin/bash
for id in $(jq '._id."$oid"' info.txt | sed -e 's/"//g'); do
  if ! grep -e "${id}" list.txt >/dev/null; then
    grep -e "${id}" info.txt
  fi
done >clean.txt

Note that in this second example the second grep is needed because you never take the whole line of the into.txt file, only the id.请注意,在第二个示例中,需要第二个 grep,因为您从不获取 into.txt 文件的整行,只获取 id。

Also, be aware that if you have an alias like alias grep='grep --color=always' it could break your output.另外,请注意,如果您有一个别名,例如alias grep='grep --color=always'它可能会破坏您的输出。

假设您的 json 文件始终是常规的:

awk -F'"' 'NR==FNR{ids[$1]; next} !($6 in ids)' ids json

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM