简体   繁体   中英

Match string in file1 with string in file2

my data examples are 
1.txt
MTQZ3CODT0SQKGE3QE6B | j t | j | t | 22312 | stimpy | EST | 8 | 20 | text | list | 0 | | 2002-08-22 13:07:05

2.txt
 MTQZ3CODT0SQKGE3QE6B | joe@example.com

desired output 
joe@example.com | j t | j | t | 22312 | stimpy | EST | 8 | 20 | text | list | 0 | | 2002-08-22 13:07:05

I suppose to match & replace 1st column from 1.txt with 2nd column in 2.txt so far i did try :

awk 'BEGIN { while((getline < "file2.txt") > 0) a[$1]=$3 } { $1 = a[$1] } 1' file1.txt

Its work well but after 12hours of running i just finalise only 1GB looks very slow

INFO: file1.txt=7GB  file2.txt=4GB my memory 16GB

I'm not sure what cause the slowly thing but i hope if there's another fast way then i'm using of awk will be helpfull.
Thanks!!

Note: I'm running out of memory is there another way to do it and that's to not have an array at all? Also in my case lines are randomly and not in the same lines!

$ join <(sort 2.txt) <(sort 1.txt) | cut -d' ' -f3-
joe@example.com | j t | j | t | 22312 | stimpy | EST | 8 | 20 | text | list | 0 | | 2002-08-22 13:07:05

If that's not all you need then edit your question to provide more truly representative sample input/output including cases that this doesn't work for.

You may use this awk :

awk -F ' *\\| *' -v OFS=' | ' '
FNR == NR {
   map[$1]=$2
   next
}
$1 in map {
   $1 = map[$1]
} 1' 2.txt 1.txt
joe@example.com | j t | j | t | 22312 | stimpy | EST | 8 | 20 | text | list | 0 |  | 2002-08-22 13:07:05

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM