简体   繁体   English

在带有CSV的文本文件上使用sed

[英]Using sed on text files with a csv

I've been trying to do bulk find and replace on two text files using a csv. 我一直在尝试使用csv对两个文本文件进行批量查找和替换。 I've seen the questions that SO suggests, and none seem to answer my question. 我已经看到了SO提出的问题,但似乎没有人回答我的问题。

I've created two variables for the two text files I want to modify. 我为要修改的两个文本文件创建了两个变量。 The csv has two columns and hundreds of rows. CSV具有两列和几百行。 The first column contains strings (none have whitespaces) already in the text file that need to be replaced with the corresponding strings in same row in the second column. 第一列包含文本文件中已经存在的字符串(没有空格),需要用第二列同一行中的相应字符串替换。

As a test, I tried the script 作为测试,我尝试了脚本

#!/bin/bash

test1='long_file_name.txt'
find='string1'
replace='string2'

sed -e "s/$find/$replace/g" $test1 > $test1.tmp && mv $test1.tmp $test1

This was successful, except that I need to do it once for every row in the csv, using the values given by the csv in each row. 这是成功的,除了我需要使用每行中csv给定的值对csv中的每一行执行一次。 My hunch is that my while loop was used wrongly, but I can't find the error. 我的直觉是我的while循环被错误地使用,但是我找不到错误。 When I execute the script below, I get the command line prompt, which makes me think that something has happened. 当我执行下面的脚本时,我得到命令行提示符,这使我认为事情已经发生了。 When I check the text files, nothing's changed. 当我检查文本文件时,没有任何改变。

The two text files, this script, and the csv are all in the same folder (it's also been my working directory when I do this). 这两个文本文件,此脚本和csv都在同一文件夹中(当我执行此操作时,这也是我的工作目录)。

#!/bin/bash

textfile1='long_file_name1.txt'
textfile2='long_file_name2.txt'

while IFS=, read f1 f2
do
    sed -e "s/$f1/$f2/g" $textfile1 > $textfile1.tmp && \
         mv $textfile1.tmp $textfile1
    sed -e "s/$f1/$f2/g" $textfile2 > $textfile2.tmp && \
         mv $textfile2.tmp $textfile2
done <'findreplace.csv'

It seems to me that this code should do what I want it to do (but doesn't); 在我看来,这段代码应该按照我想要的去做(但不是); perhaps I'm misunderstanding something fundamental (I'm new to bash scripting)? 也许我误解了一些基本知识(我是bash脚本的新手)?

The csv looks like this, but with hundreds of rows. csv看起来像这样,但是有数百行。 All a_i's should be replaced with their counterpart b_i in the next column over. 在下一列中,所有a_i应替换为其对应的b_i。

a_1 b_1
a_2 b_2
a_3 b_3

Something to note: All the strings actually contain underscores, just in case this affects something. 注意事项:所有字符串实际上都包含下划线,以防万一这会影响某些内容。 I've tried wrapping the variable name in braces a la ${var}, but it still doesn't work. 我试过将变量名用大括号$ {var}括起来,但是仍然不起作用。

I appreciate the solutions, but I'm also curious to know why the above doesn't work. 我很欣赏这些解决方案,但也很想知道为什么上述方法不起作用。 (Also, I would vote everyone up, but I lack the reputation to do so. However, know that I appreciate and am learning a lot from your answers!) (此外,我会投票给所有人,但我没有这样做的声誉。但是,请注意,我很感激,并从您的回答中学到很多东西!

If you are going to process lot of data and your patterns can contain a special character I would consider using Perl. 如果您要处理大量数据,并且您的模式可以包含特殊字符,我会考虑使用Perl。 Especially if you are going to have a lot of pairs in findreplace.csv . 尤其是在findreplace.csv要有很多对的findreplace.csv You can use following script as filter or in-place modification with lot of files. 您可以使用以下脚本作为筛选器或对很多文件进行就地修改。 As side effect, it will load replacements and create Aho-Corrasic automaton only once per invocation which will make this solution pretty efficient ( O(M+N) instead of O(M*N) in your solution). 副作用是,它每次调用仅加载一次替换并创建Aho-Corrasic自动机,这将使该解决方案非常有效(解决方案中的O(M+N)而不是O(M*N) )。

#!/usr/bin/perl
use strict;
use warnings;
use autodie;

my $in_place = ( @ARGV and $ARGV[0] =~ /^-i(.*)/ )
    ? do {
    shift;
    my $backup_extension = $1;
    my $backup_name      = $backup_extension =~ /\*/
        ? sub { ( my $fn = $backup_extension ) =~ s/\*/$_[0]/; $fn }
        : sub { shift . $backup_extension };
    my $oldargv = '-';
    sub {
        if ( $ARGV ne $oldargv ) {
            rename( $ARGV, $backup_name->($ARGV) );
            open( ARGVOUT, '>', $ARGV );
            select(ARGVOUT);
            $oldargv = $ARGV;
        }
    };
    }
    : sub { };

die "$0: File with replacements required." unless @ARGV;
my ( $re, %replace );
do {
    my $filename = shift;
    open my $fh, '<', $filename;
    %replace = map { chomp; split ',', $_, 2 } <$fh>;
    close $fh;
    $re = join '|', map quotemeta, keys %replace;
    $re = qr/($re)/;
};

while (<>) {
    $in_place->();
    s/$re/$replace{$1}/g;
}
continue {print}

Usage: 用法:

./replace.pl replace.csv <file.in >file.out

as well as 以及

./replace.pl replace.csv file.in >file.out

or in-place 或就地

./replace.pl -i replace.csv file1.csv file2.csv file3.csv

or with backup 或带备份

./replace.pl -i.orig replace.csv file1.csv file2.csv file3.csv

or with backup whit placeholder 或带备用丝毫占位符

./replace.pl -ithere.is.\*.original replace.csv file1.csv file2.csv file3.csv

You should convert your CSV file to a sed.script with the following command: 您应该使用以下命令将CSV文件转换为sed.script:

cat replace.csv | awk -F, '{print "s/" $1 "/" $2 "/g";}' > sed.script

And then you will be able to do a one pass replacement: 然后,您将可以进行一次通行证更换:

sed -i -f sed.script longfilename.txt

This will be a faster implementation of what you wanna do. 这将是您想做的事的更快实现。

BTW, sorry, but I do not understand what is wrong with your script which should work except if your CSV file has more than 2 columns. 顺便说一句,对不起,但是我不明白您的脚本有什么问题,该脚本应该可以工作,除非您的CSV文件有两列以上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM