简体   繁体   English

删除 bash 中的反向匹配

[英]Remove reverse matches in bash

I have a file, list.txt that contains:我有一个文件list.txt ,其中包含:

234
243
324
342
423
432

How can I find if reverse patterns (ie, 432 is the reverse pattern of 234) exists and remove the reverse pattern?如何查找反向模式(即 432 是 234 的反向模式)是否存在并删除反向模式? I have attempted我已经尝试过

while IFS= read -r line; do
  reverse=$(echo $line|rev)
  if grep -q $reverse list.txt; then
    sed -i "s/$reverse//g" list.txt
  else :
  fi
done < list.txt

but this removes every line from list.txt .但这会从list.txt删除每一行。 My expected output would be我的预期输出是

234
243
324

Is what I want to accomplish possible?我想要完成的事情可能吗? My MWE is a short list, but this list can (and will) grow considerably.我的 MWE 是一个简短的列表,但是这个列表可以(并且将会)大幅增长。 Thanks in advance.提前致谢。

Original Question: Removing All Items With Reverses In The Input原始问题:删除输入中带有反向的所有项目

Removing all strings that are inverses of any other string in the file would look like:删除与文件中任何其他字符串相反的所有字符串将如下所示:

grep -Fvf <(rev list.txt) <list.txt >list.txt.new && mv list.txt.new list.txt

Let's break that down:让我们分解一下:

  • grep -F matches only fixed strings. grep -F仅匹配固定字符串。
  • grep -v inverts the match, emitting things that don't match. grep -v反转匹配,发出不匹配的内容。
  • grep -f filename reads the list of patterns to look for from filename . grep -f filename读取的方式从查找列表filename
  • <(rev list.txt) is a process substitution that expands to a filename from which the output of rev list.txt can be read. <(rev list.txt)是一个进程替换,它扩展为一个文件名, rev list.txt可以读取rev list.txt的输出。
  • <list.txt connects list.txt to the stdin of your grep . <list.txt所连接list.txt你的标准输入grep
  • >list.txt.new connects stdout of grep to a new file; >list.txt.new将 grep 的 stdout 连接到一个新文件; this is important, since >list.txt would overwrite your output file before its original contents could be read.这很重要,因为>list.txt其原始内容被读取之前覆盖您的输出文件。

However, with your sample input, this results in completely empty output -- because every line in that sample input file has a reverse version elsewhere in that file.但是,对于您的示例输入,这会导致完全空的输出——因为该示例输入文件中的每一行在该文件的其他地方都有一个反向版本。


Refined Question: Removing Only Inverses Not Previously Seen精炼问题:仅删除以前未见过的

Given your sample data, you don't really want to remove all data that has an inverse version somewhere else in the input file.给定您的示例数据,您真的不想删除输入文件中其他位置具有反向版本的所有数据。 Instead, you want to read top-to-bottom, and print only things whose inverses weren't already seen .相反,您希望从上到下阅读,并仅打印尚未看到其倒数的内容。

One way to do that would be the following:一种方法是:

#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4.0+ needed" >&2; exit 1;; esac

declare -A blacklisted=( )
while IFS= read -r orig <&3 && IFS= read -r rev <&4; do
  [[ ${blacklisted[$orig]} ]] && continue
  blacklisted[$rev]=1
  printf '%s\n' "$orig"
done 3< list.txt 4< <(rev list.txt) >list.txt.new && mv list.txt.new list.txt

BTW, note that in the real world, instead of hardcoding something like list.txt.new , you should use mktemp to create a guaranteed-unique/random name for your temporary files.顺便说一句,请注意,在现实世界中,您应该使用mktemp为临时文件创建一个保证唯一/随机的名称,而不是像list.txt.new这样的硬编码。 This doesn't just fix concurrency issues -- it also fixes security bugs .这不仅修复了并发问题——它还修复了安全漏洞

Here is an awk solution:这是一个awk解决方案:

awk 'BEGIN{FS=""} !seen[$0]{s=""; for (i=NF; i>0; i--) s=s $i; seen[s]++; print}' file

234
243
324

Explanation:解释:

  • BEGIN{FS=""} : Set input file separator to empty string so that every character in input becomes a field in awk. BEGIN{FS=""} :将输入文件分隔符设置为空字符串,以便输入中的每个字符都成为 awk 中的一个字段。
  • !seen[$0] { : if current row is not found in seen array !seen[$0] { : 如果在可见数组中找不到当前行
    • s=""; : Initialize s to empty string : 将s初始化为空字符串
    • for (i=NF; i>0; i--) s=s $i : Run a reverse loop and strore reverse string in s for (i=NF; i>0; i--) s=s $i :运行反向循环并在s存储反向字符串
    • seen[s]++; : Store s in array seen : 将s存储在数组seen
    • print : Print current row print :打印当前行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM