简体   繁体   中英

copying columns from different files into a single file using awk

I have about more than 500 files having two columns "Gene short name" and "FPKM" values. The number of rows is same and the "Gene short name" column is common in all the files. I want to create a matrix by keeping first column as gene short name (can be taken from any of the files) and rest other columns having the FPKM.

I have used this command which works well, but then, how can I use it for 500 files?

 paste -d' ' <(awk -F'\t' '{print $1}' 69_genes.fpkm.txt) \
            <(awk -F'\t' '{print $2}' 69_genes.fpkm.txt) \
            <(awk -F'\t' '{print $2}' 72_genes.fpkm.txt) \
            <(awk -F'\t' '{print $2}' 75_genes.fpkm.txt) \
            <(awk -F'\t' '{print $2}' 78_genes.fpkm.txt) > col.txt

sample data (files are tab separated):

head 69_genes.fpkm.txt 
gene_short_name FPKM
        DDX11L1 0.196141
        MIR1302-2HG 0.532631
        MIR1302-2   0
        WASH7P  4.51437

Expected outcome

gene_short_name FPKM FPKM FPKM FPKM
DDX11L1 0.196141 0.206591 0.0201256 0.363618
MIR1302-2HG 0.532631 0.0930007 0.0775838 0
MIR1302-2 0 0 0 0
WASH7P 4.51437 3.31073 3.23326 1.05673
MIR6859-1 0 0 0 0
FAM138A 0.505155 0.121703 0.105235 0
OR4G4P 0.0536387 0 0 0
OR4G11P 0 0 0 0
OR4F5 0.0390888 0.0586067 0 0

Also, I want to change the name "FPKM" to "filename_FPKM".

Given the input

$ cat a.txt
a       1
b       2
c       3
$ cat b.txt
a       I
b       II
c       III
$ cat c.txt
a       one
b       two
c       three

you can loop:

cut -f1 a.txt > result.txt
for f in a.txt b.txt c.txt
do
  cut -f2 "$f" | paste result.txt - > tmp.txt
  mv {tmp,result}.txt
done
$ cat result.txt
a       1       I       one
b       2       II      two
c       3       III     three

In awk, using @Micha's data for clarity:

$ awk '  
BEGIN { FS=OFS="\t" }    # set the field separators
FNR==1 {
    $2=FILENAME "_" $2   # on first record of each file rename $2
}
NR==FNR {                # process the first file
    a[FNR]=$0            # hash whole record to a
    next
}
{                        # process other files
    a[FNR]=a[FNR] OFS $2 # add $2 to the end of the record
}
END {                    # in the end
    for(i=1;i<=FNR;i++)  # print all records
        print a[i]
}' a.txt b.txt c.txt

Output:

a       a.txt_1 b.txt_I c.txt_one
b       2       II      two
c       3       III     three

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM