[英]Remove reverse matches in bash
I have a file, list.txt
that contains:我有一个文件
list.txt
,其中包含:
234
243
324
342
423
432
How can I find if reverse patterns (ie, 432 is the reverse pattern of 234) exists and remove the reverse pattern?如何查找反向模式(即 432 是 234 的反向模式)是否存在并删除反向模式? I have attempted
我已经尝试过
while IFS= read -r line; do
reverse=$(echo $line|rev)
if grep -q $reverse list.txt; then
sed -i "s/$reverse//g" list.txt
else :
fi
done < list.txt
but this removes every line from list.txt
.但这会从
list.txt
删除每一行。 My expected output would be我的预期输出是
234
243
324
Is what I want to accomplish possible?我想要完成的事情可能吗? My MWE is a short list, but this list can (and will) grow considerably.
我的 MWE 是一个简短的列表,但是这个列表可以(并且将会)大幅增长。 Thanks in advance.
提前致谢。
Removing all strings that are inverses of any other string in the file would look like:删除与文件中任何其他字符串相反的所有字符串将如下所示:
grep -Fvf <(rev list.txt) <list.txt >list.txt.new && mv list.txt.new list.txt
Let's break that down:让我们分解一下:
grep -F
matches only fixed strings. grep -F
仅匹配固定字符串。grep -v
inverts the match, emitting things that don't match. grep -v
反转匹配,发出不匹配的内容。grep -f filename
reads the list of patterns to look for from filename
. grep -f filename
读取的方式从查找列表filename
。<(rev list.txt)
is a process substitution that expands to a filename from which the output of rev list.txt
can be read. <(rev list.txt)
是一个进程替换,它扩展为一个文件名, rev list.txt
可以读取rev list.txt
的输出。<list.txt
connects list.txt
to the stdin of your grep
. <list.txt
所连接list.txt
你的标准输入grep
。>list.txt.new
connects stdout of grep to a new file; >list.txt.new
将 grep 的 stdout 连接到一个新文件; this is important, since >list.txt
would overwrite your output file before its original contents could be read.>list.txt
会在其原始内容被读取之前覆盖您的输出文件。 However, with your sample input, this results in completely empty output -- because every line in that sample input file has a reverse version elsewhere in that file.但是,对于您的示例输入,这会导致完全空的输出——因为该示例输入文件中的每一行在该文件的其他地方都有一个反向版本。
Given your sample data, you don't really want to remove all data that has an inverse version somewhere else in the input file.给定您的示例数据,您真的不想删除输入文件中其他位置具有反向版本的所有数据。 Instead, you want to read top-to-bottom, and print only things whose inverses weren't already seen .
相反,您希望从上到下阅读,并仅打印尚未看到其倒数的内容。
One way to do that would be the following:一种方法是:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4.0+ needed" >&2; exit 1;; esac
declare -A blacklisted=( )
while IFS= read -r orig <&3 && IFS= read -r rev <&4; do
[[ ${blacklisted[$orig]} ]] && continue
blacklisted[$rev]=1
printf '%s\n' "$orig"
done 3< list.txt 4< <(rev list.txt) >list.txt.new && mv list.txt.new list.txt
BTW, note that in the real world, instead of hardcoding something like list.txt.new
, you should use mktemp
to create a guaranteed-unique/random name for your temporary files.顺便说一句,请注意,在现实世界中,您应该使用
mktemp
为临时文件创建一个保证唯一/随机的名称,而不是像list.txt.new
这样的硬编码。 This doesn't just fix concurrency issues -- it also fixes security bugs .这不仅修复了并发问题——它还修复了安全漏洞。
Here is an awk
solution:这是一个
awk
解决方案:
awk 'BEGIN{FS=""} !seen[$0]{s=""; for (i=NF; i>0; i--) s=s $i; seen[s]++; print}' file
234
243
324
Explanation:解释:
BEGIN{FS=""}
: Set input file separator to empty string so that every character in input becomes a field in awk. BEGIN{FS=""}
:将输入文件分隔符设置为空字符串,以便输入中的每个字符都成为 awk 中的一个字段。!seen[$0] {
: if current row is not found in seen array !seen[$0] {
: 如果在可见数组中找不到当前行
s="";
: Initialize s
to empty string s
初始化为空字符串for (i=NF; i>0; i--) s=s $i
: Run a reverse loop and strore reverse string in s
for (i=NF; i>0; i--) s=s $i
:运行反向循环并在s
存储反向字符串seen[s]++;
: Store s
in array seen
s
存储在数组seen
print
: Print current row print
:打印当前行
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.