简体   繁体   English

跨多个文件逐步查找和替换-Bash

[英]Find and Replace Incrementally Across Multiple Files - Bash

I apologize in advance if this belongs in SuperUser, I always have a hard time discerning whether these scripting in bash questions are better placed here or there. 如果这属于SuperUser,我会先道歉,我总是很难辨别bash问题中的这些脚本是放置在此处还是放置在此处。 Currently I know how to find and replace strings in multiple files, and how to find and replace strings within a single file incrementally from searching for a solution to this issue, but how to combine them eludes me. 目前,我知道如何查找和替换多个文件中的字符串,以及如何通过搜索此问题的解决方案来逐步查找和替换单个文件中的字符串,但是如何组合它们却使我难以理解。

Here's the explanation: 解释如下:

  • I have a few hundred files, each in sets of two: a data file (.data), and a message file (data.ms). 我有几百个文件,每个文件以两个为一组:数据文件(.data)和消息文件(data.ms)。
  • These files are linked via a key value unique to each set of two that looks like: ab.cdefghi 这些文件通过唯一的两个值的每个键值链接,如下所示: ab.cdefghi

Here's what I want to do: 这是我想做的:

  • Step through each .data file and do the following: 逐步浏览每个.data文件,然后执行以下操作:
  • Find: 找:

     MessageKey ab.cdefghi 
  • Replace: 更换:

     MessageKey xx.aaa0001 MessageKey xx.aaa0002 ... MessageKey xx.aaa0010 etc. 

    Incrementing by 1 every time I get to a new file. 每次获取新文件时,增量为1。

Clarifications: 澄清:

  • For reference, there is only one instance of "MessageKey" in every file. 作为参考,每个文件中只有一个“ MessageKey”实例。
  • The paired files have the same name, only their extensions differ, so I could simply step through all .data files and then all .data.ms files and use whatever incremental solution on both and they'd match fine, don't need anything too fancy to edit two files in tandem or anything. 配对的文件具有相同的名称,只是它们的扩展名不同,因此我可以简单地依次遍历所有.data文件和所有.data.ms文件,并使用二者上的任何增量解决方案,它们可以很好地匹配,不需要任何操作太花哨,无法一前一后地编辑两个文件。
  • For all intents and purposes whatever currently appears on the line after each MessageKey is garbage and I am completely throwing it out and replacing it with xx.aaa#### 出于所有意图和目的,每个MessageKey之后当前出现在行上的内容都是垃圾,我将其完全丢弃并将其替换为xx.aaa ####
  • String length does matter, so I need xx.aa0009, xx.aaa0010 not xx.aa0009, xx.aa00010 字符串长度很重要,因此我需要xx.aa0009,xx.aaa0010而不是xx.aa0009,xx.aa00010
  • I'm using cygwin. 我正在使用cygwin。

I would approach this by creating a mapping from old key to new and dumping that into a temp file. 我将通过创建从旧键到新键的映射并将其转储到临时文件中来解决此问题。

grep MessageKey *.data \
  | sort -u \
  | awk '{ printf("%s:xx.aaa%04d\n", $1, ++i); }' \
  > /tmp/key_mapping

From there I would confirm that the file looks right before I applied the mapping using sed to the files. 从那里,在将sed应用于文件之前,我将确认文件看起来正确。

cat /tmp/key_mapping \
  | while read old new; do
      sed -i -e "s:MessageKey $old:MessageKey $new:" * \
    done

This will probably work for you, but it's neither elegant or efficient. 这可能会为您工作,但既不优雅也不高效。 This is how I would do it if I were only going to run it once. 如果我只运行一次,这就是我要这样做的方式。 If I were going to run this regularly and efficiency mattered, I would probably write a quick python script. 如果我要定期运行此程序并且效率很重要,那么我可能会写一个快速的python脚本。

@Carl.Anderson got me started on the right track and after a little tweaking, I ended up implementing his solution but with some syntax tweaks. @ Carl.Anderson使我开始走上正确的道路,经过一些调整后,我最终实现了他的解决方案,但进行了一些语法调整。

First of all, this solution only works if all of your files are located in the same directory. 首先,仅当所有文件都位于同一目录中时,此解决方案才有效。 I'm sure anyone with even slightly more experience with UNIX than me could modify this to work recursively, but here goes: 我确信任何比UNIX经验更多的人都可以修改它以递归地工作,但是这里有:

First I ran: 首先我跑了:

-hr "MessageKey" . | sort -u | awk '{ printf("%s:xx.aaa%04d\n", $2, ++i); }' > MessageKey

This command was used to create a find and replace map file called "MessageKey." 该命令用于创建一个名为“ MessageKey”的查找和替换映射文件。

The contents of which looked like: 其内容如下:

In.Rtilyd1:aa.xxx0087
In.Rzueei1:aa.xxx0088
In.Sfricf1:aa.xxx0089
In.Slooac1:aa.xxx0090
etc...

Then I ran: 然后我跑了:

MessageKey | while IFS=: read old new; do sed -i -e "s/MessageKey $old/MessageKey $new/" *Data ; done

I had to use IFS=: (or I could have alternatively find and replaced all : in the map file with a space, but the former seemed easier. 我不得不使用IFS =:(或者我可以选择在地图文件中查找并用空格替换all,但是前者似乎更容易。

Anyway, in the end this worked! 无论如何,最后还是成功了! Thanks Carl for pointing me in the right direction. 感谢卡尔为我指出正确的方向。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM