简体   繁体   中英

Script to look for a word in file1 and copy the next word and replace that in file2

I have file1

(1'a lot of singapore 1.2.3.4 'some other info',0,null, 12345),

(2,'a lot of brazil  4.2.3.1, 'some other info',0,null, 12345),

(3,'a lot of india 3.4.2.1, 'some other info',0,null, 12345),

(4,'a lot of laos 1.3.4.5, 'some other info',0,null, 12345),

(5,'a lot of china 1.2.3.5, 'some other info',0,null, 12345);

and file2

(1'a lot of singapore A.B.C.D 'some other info',0,null, 12345),

(2,'a lot of brazil E.F.G.H, 'some other info',0,null, 12345),

(3,'a lot of india H.I.J.K, 'some other info',0,null, 12345),

(4,'a lot of laos L.M.N.O, 'some other info',0,null, 12345),

(5,'a lot of china P.Q.R.S, 'some other info',0,null, 12345);

I have created a script but to copy and replace with LINE number but need input to look for SINGAPORE in file 1 and copy next word 1.2.3.4 and look for singapore in file2 and replace the next word here from 1.2.3.4 - ABC.D and the final file2 looks like this

(1'a lot of singapore 1.2.3.4 'some other info',0,null, 12345),

Python script or Awk or sed any script will be helpful.

So far I have created this to copy and replace line numbers

sed -i '2d' File2.txt
awk 'NR==5380{a=$0}NR==FNR{next}FNR==2{print a}1' file1.txt file2.txt

I'm not sure it will work and it's the best solution, but you need something like this.

import re

def try_to_get_country_data(line, country):
    line_parts = line.split(',')
    part_with_data = line_parts[1]
    
    if (match := re.search(f'.* {country} (.*)', part_with_data)) is not None:
        return match.group(1)
    
    return None
    
if __name__ == "__main__":
    found_data = None
    country = 'singapore'

    with open('some_file.txt', 'r') as f:
        for line in f:
            if (found_data := try_to_get_country_data(line, country)) is not None:
                break

    if found_data is not None:
        with open('second_file.txt', 'r') as f2:
            data = f2.readlines()

        for i, line in enumerate(data):
            if (replaced_data := try_to_get_country_data(line, country)) is not None:
                data[i] = line.replace(replaced_data, found_data)
                break

        with open('second_file.txt', 'w') as f2:
            f2.writelines(data)

So, I've checked it, and it work if line pattern same for each line.

In case you would like a short bash script and assuming that the structure of the files is constant you could try something like this:

country="singapore"
a=$(grep "${country}" file0 | awk '{print $5}')

if [[ "${a}" ]]
then
    b=$(grep -w "${country}" file1 | awk '{print $5}')
    sed "s/${country} ${b}/${country} ${a}/g" file1
fi

Find below the output of the script:

(1'a lot of singapore 1.2.3.4 'some other info',0,null, 12345),

(2,'a lot of brazil E.F.G.H, 'some other info',0,null, 12345),

(3,'a lot of india H.I.J.K, 'some other info',0,null, 12345),

(4,'a lot of laos L.M.N.O, 'some other info',0,null, 12345),

(5,'a lot of china P.Q.R.S, 'some other info',0,null, 12345);

Use sed -i in order to edit file1 in place.

In order to avoid reading the same file multiple times and reducing a little bit the readability, the initial approach may be easily refactored as follows:

country="singapore"
file0c=$(cat file0)
file1c=$(cat file1)

a=$(echo "${file1c}" | grep -w "${country}" | awk '{print $5}')

if [[ "${a}" ]]
then
    b=$(echo "${file1c}" | grep -w "${country}" | awk '{print $5}')
    echo "${file1c}" | sed "s/${country} ${b}/${country} ${a}/g" | 
    tee file1_new
fi

Here is a simple Awk script to look for the replacement text from the first input file and replace the corresponding token in the second input file.

awk -v country="singapore" 'NR == FNR {
    for (i=2; i<=NF; i++) if ($(i-1) == country) token = $i; next }
  $0 ~ country { for(i=2; i<=NF; i++) if ($(i-1) == country) $i = token
    } 1' file1 file2 >newfile2

When we are reading file1 , NR == FNR is true. We loop over the input tokens and check for one which matches country ; if we find one, we set token to that value. This means that if there are multiple matches on the country keyword, the last one in the first input file will be extracted.

The next statement causes Awk to skip the rest of the script for this input file, so the lines from file1 are only read, and not processed further.

If we fall through to the last line, we are now reading file2 . If we see a line which contains the keyword, we perform a substitution on the keyword after the country keyword. (This requires the keyword to be an isolated token, not a substring within a longer word etc.) The final 1 causes all lines which get this far to be printed back to standard output, thus generating a copy of the second file with any substitutions performed.

If you have any control over the data format used here, perhaps try to figure out a way to get the input in a less haphazard ad-hoc format, like JSON.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM