How to replace multi-column from one text file to a column in another text file?

Question

I have:

$ cat file1.csv (tab delimited)
R923E06 273911 2990492 2970203 F Resistant 
R923F06 273910 2990492 2970203 F Resistant 
R923H02 273894 2970600 2990171 M Resistant

and:

$ cat file2.txt (space delimited and it's a large file)
R923E06 CC GG TT AA ...
R923F06 GG TT AA CC ...
R923H02 TT GG CC AA ...

How can I replace of first column in file2.txt with all of 6 column in file1.csv ?

Answer 1

Using join you can do this:

join   <(sed -e 's/\t/ /g' file1.csv) <(cat file2.txt)

sed to change tabs to space

join to joining lines of two files on a common field.

Output:

R923E06 273911 2990492 2970203 F Resistant  CC GG TT AA ...
R923F06 273910 2990492 2970203 F Resistant  GG TT AA CC ...
R923H02 273894 2970600 2990171 M Resistant TT GG CC AA ...

Answer 2

Take a look at this AWK example:

awk 'FNR == NR { d[$1] = $0; next } { $1 = d[$1] } 1' file1.csv file2.txt

Here I replace first column in file2.txt with corresponding line (6 columns) of file1.csv .

Output:

R923E06 273911 2990492 2970203 F Resistant  CC GG TT AA ...
R923F06 273910 2990492 2970203 F Resistant  GG TT AA CC ...
R923H02 273894 2970600 2990171 M Resistant  TT GG CC AA ...

If you want everything tab-separated in the result, you can add gsub(/[[:space:]]/,"\\t") to replace any space or tab with tab:

awk 'FNR == NR { d[$1] = $0; next } { $1 = d[$1]; gsub(/[[:space:]]/,"\t") } 1' file1.csv file2.txt

Answer 3

#import pandas
import pandas as pd

#read file1.csv
#set index_col as false if file has delimiters at the end
file1 = pd.read_csv( 'file1.csv', ' ', index_col = False, names = 
['1','2','3','4','5','6']);

#read file2.txt, read_csv can read txt files as well
#set index_col as false if file has delimiters at the end
file2 = pd.read_csv( 'file2.csv', ' ', index_col = False, names = 
['1','2','3','4','5']);

#drop first column
file2.drop( '1', axis = 1, inplace = True )

#concat both frames
final = pd.concat([file1, file2], axis = 1)
#you might end up with mixed column names you can change it by using 
final.columns = ['col1', 'col2', ....]


#save as csv
final.to_csv('out.csv',sep='\t')

How to replace multi-column from one text file to a column in another text file?

Question

3 answers

solution1
2 2018-07-19 05:09:40

solution2
0 2018-07-19 05:02:23

solution3
0 2018-07-19 05:43:44

How to replace multi-column from one text file to a column in another text file?

Question

3 answers

solution1 2 2018-07-19 05:09:40

solution2 0 2018-07-19 05:02:23

solution3 0 2018-07-19 05:43:44

solution1
2 2018-07-19 05:09:40

solution2
0 2018-07-19 05:02:23

solution3
0 2018-07-19 05:43:44