[英]Grep approach to remove all lines in file that match any line in other file?
I have a file of camera information where each line has a unique ID of the format我有一个相机信息文件,其中每一行都有一个唯一的格式 ID
{"_id":{"$oid":"5b0cfa5845bb0c0004277e13"},"geometry":{"coordinates":[139.751,35.685]},"addEditBy":["dd53cbd9c5306b1baa103335c4b3e91d8b73386ba29124ea2b1d47a619c8c066877843cd8a7745ce31021a8d1548cf2a"],"legacy_cameraID":1,"type":"ip","source":"google","country":"JP","city":"Tokyo","is_active_image":false,"is_active_video":false,"utc_offset":32400,"timezone_id":"Japan Standard Time","timezone_name":"Japan Standard Time","reference_url":"101.110.193.152/","retrieval":{"ip":"101.110.193.152","port":"80","video_path":"/"},"__v":0}
I also have a list of camera IDs that I want to remove from the original file in the format:我还有一个要从原始文件中删除的相机 ID 列表,格式如下:
5b182800751c3b00044514a9
5b1976b473569e00045dba59
5b197b1273569e00045ddf0f
5b1970cc73569e00045d94fc
How can I use grep or some other command line utility to remove all lines in the input file that have an ID listed in the second file?如何使用 grep 或其他一些命令行实用程序删除输入文件中具有在第二个文件中列出的 ID 的所有行?
Let's say that you have a file called ids.txt
that has all of the camera id's that need to be excluded from your data file, which we'll call data.json
.假设您有一个名为ids.txt
文件,其中包含需要从您的数据文件中排除的所有相机 ID,我们将其称为data.json
。 We can use the -f
option of grep (match from a file) and the -v
option (only output non-matching lines) as follows:我们可以使用 grep 的-f
选项(从文件匹配)和-v
选项(仅输出不匹配的行),如下所示:
grep -f ids.txt -v data.json
grep
will only output lines of data.json
that do not match any lines in ids.txt
. grep
只会输出与ids.txt
中的任何行都不匹配的data.json
行。
You should use json aware tool.您应该使用 json 感知工具。 Here is a GNU awk script that uses json extension:这是一个使用 json 扩展名的 GNU awk 脚本:
$ gawk ' # GNU awk
@load "json" # load extension
NR==FNR { # read oids to a hash
oid[$0]
next
}
{ # process json
lines=lines $0 # support multiline json form
if(json_fromJSON(lines,data)!=0) { # once json is complete
if(!(data["_id"]["$oid"] in oid)) # test if oid in exclude list
print # output if not
lines="" # rinse for repeat
}
}' oids json
A simple thing you can do is get ids from camera info and check if they are listed in the second file.您可以做的一件简单的事情是从相机信息中获取 id,并检查它们是否列在第二个文件中。
For example:例如:
#!/bin/bash
exec 3<info.txt
while IFS= read -r line <&3; do
id="$(printf '%s' "${line}" | jq '._id."$oid"' | sed -e 's/"//g')"
if ! grep -e "${id}" list.txt >/dev/null; then
printf '%s\n' "${line}"
fi
done >clean.txt
exec 3>&-
Where:在哪里:
info.txt
is the file with camera information info.txt
是包含相机信息的文件list.txt
is the list of ids you do not want list.txt
是您不想要的 id 列表Note that this is not the only way you can achieve it, I used a simple cycle just as poc.请注意,这不是您实现它的唯一方法,我使用了一个简单的循环,就像 poc。
You can achieve it using directly jq, for example:可以直接使用jq来实现,例如:
#!/bin/bash
for id in $(jq '._id."$oid"' info.txt | sed -e 's/"//g'); do
if ! grep -e "${id}" list.txt >/dev/null; then
grep -e "${id}" info.txt
fi
done >clean.txt
Note that in this second example the second grep is needed because you never take the whole line of the into.txt file, only the id.请注意,在第二个示例中,需要第二个 grep,因为您从不获取 into.txt 文件的整行,只获取 id。
Also, be aware that if you have an alias like alias grep='grep --color=always'
it could break your output.另外,请注意,如果您有一个别名,例如alias grep='grep --color=always'
它可能会破坏您的输出。
假设您的 json 文件始终是常规的:
awk -F'"' 'NR==FNR{ids[$1]; next} !($6 in ids)' ids json
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.