简体   繁体   English

如何将多个文件按几列相交

[英]how to intersect multiple files by several columns

I have spent a lot of time on this any help would be appreciated.我在这方面花了很多时间,任何帮助将不胜感激。 I have two files as below;我有两个文件如下; what I want to do is to search for every item of f1_col1 and f1_col2 separately inside the f2_col3 - if an item exists then save it and add its related row from the (f2_col3) to a new column in the new df.我想要做的是在f2_col3中分别搜索f1_col1f1_col2的每个项目 - 如果项目存在,则保存它并将其相关行从(f2_col3)添加到新 df 中的新列中。

f1:(two columns) f1:(两列)

f1_col1,f1_col2
kctd,Malat1
Gas5,Snhg6

f2:(three columns) f2:(三列)

f2_col1,f2_col2,f2_col3
chr7,snRNA,Gas5
chr1,protein_coding,Malat1
chr2,TEC,Snhg6
chr1,TEC,kctd

So based on the two files mentioned the desired output should be:因此,基于提到的两个文件,所需的 output 应该是:

new_df:新的_df:

f1_col1,f1_col2,f2_col1,f2_col1
kctd,Malat1,chr1,chr1
Gas5,Snhg6,chr7,chr2

note: f2_col2 is not important.注意: f2_col2 并不重要。

I do not have a strong programming background and found this very difficult - Even though I have checked multiple sources but have not been able to develop a solution - any help is appreciated.我没有很强的编程背景,发现这非常困难 - 即使我检查了多个来源但无法开发解决方案 - 感谢任何帮助。 Thanks谢谢

Based on 1 possible interpretation of your requirements and the 1 sunny-day example you provided where every key field always matches on every line, this MAY be what you're trying to do:基于对您的要求的 1 种可能解释和您提供的 1 个晴天示例,其中每个关键字段始终在每一行上匹配,这可能是您正在尝试做的事情:

$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR {
    if ( FNR == 1 ) {
        hdr = $1
    }
    map[$3] = $1
    next
}
{ print $0, ( FNR>1 ? map[$1] OFS map[$2] : hdr OFS hdr ) }

$ awk -f tst.awk f2 f1
f1_col1,f1_col2,f2_col1,f2_col1
kctd,Malat1,chr1,chr1
Gas5,Snhg6,chr7,chr2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM