简体   繁体   中英

I need to replace a string in one file using key value paris from another file

I have a single attribute file that has two columns. The string in column 1 matches the string in the files that need to be changed. The string in file 2 needs to be the string in file 1 column 2.

I'm not sure the best way to approach this sed? awk? There is only a single file 1 that has every key and value pair, they are all unique. There are over 10,000 File 2, that are each different but have the same format, that I would need to change from the numbers to the names. Every number in any of the File 2's will be in File 1.

File 1

1000079541  ALBlai_CCA27168
1000079542  ALBlai_CCA27169
1000082614  PHYsoj_128987
1000082623  PHYsoj_128997
1000112581  PHYcap_Phyca_508162
1000112588  PHYcap_Phyca_508166
1000112589  PHYcap_Phyca_508170
1000112592  PHYcap_Phyca_549547
1000120087  HYAara_HpaP801280
1000134210  PHYinf_PITG_01218T0
1000134213  PHYinf_PITG_01223T0
1000134221  PHYinf_PITG_01231T0
1000144497  PHYinf_PITG_13921T0
1000153541  PYTultPYU1_T002777
1000162512  PYTultPYU1_T013706
1000163504  PYTultPYU1_T014907
1000168326  PHYram_79731
1000168327  PHYram_79730
1000168332  PHYram_79725
1000168335  PHYram_79722
...

File 2

(1000079542:0.60919245567850022205,((1000162512:0.41491233674846345059,(1000153541:0.39076742568979516701,1000163504:0.52813999143574519302):0.14562273102476630537):0.28880212838980307000,(((1000144497:0.20364901110426453235,1000168327:0.22130795712572320921):0.35964649479701132906,((1000120087:0.34990382691181332042,(1000112588:0.08084123331549526725,(1000168332:0.12176200773214326811,1000134213:0.09481932223544080329):0.00945982345360765406):0.01846847662360769429):0.19758412044470402558,((1000168326:0.06182031367986642878,1000112589:0.07837371928562210377):0.03460740736793390532,(1000134210:0.13512192366876615846,(1000082623:0.13344777464787777044,1000112592:0.14943677128375676411):0.03425386814075986885):0.05235436818005634318):0.44112430521695145114):0.21763784827666701749):0.22507080810857052477,(1000112581:0.02102132893524749635,(1000134221:0.10938436290969000275,(1000082614:0.05263067805665807425,1000168335:0.07681947209386902342):0.03562545894572662769):0.02623229853693959113):0.49114147006852687527):0.23017851954961116023):0.64646763541457552549,1000079541:0.90035900920746847476):0.0;

Desired Result

(ALBlai_CCA27169:0.60919245567850022205,((PYTultPYU1_T013706:0.41491233674846345059, ...

Python:

import re

# Build a dictionary of replacements:
with open('File 1') as f:
    repl = dict(line.split() for line in f)

# Read in the file and make the replacements:
with open('File 2') as f:
    data = f.read()
data = re.sub(r'(\d+):',lambda m: repl[m.group(1)]+':',data)

# Write it back out:
with open('File 2','w') as f:
    f.write(data)

Full running awk solution. Hope it helps.

awk -F":" 'BEGIN {
  while (getline < "file1")
  {
    split($0,dat," ");
    a[dat[1]]=dat[2];
  }
}
{
  gsub(substr($1,2,length($1)),a[substr($1,2,length($1))],$0); print
}' file2

I'll do something like that in bash:

while read -r key value
do
  echo s/($key:/($value:/g >> sedtmpfile
done < file1
sed -f sedtmpfile file2 > result
rm sedtmpfile

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM