[英]how to intersect multiple files by several columns
I have spent a lot of time on this any help would be appreciated.我在这方面花了很多时间,任何帮助将不胜感激。 I have two files as below;我有两个文件如下; what I want to do is to search for every item of f1_col1
and f1_col2
separately inside the f2_col3
- if an item exists then save it and add its related row from the (f2_col3)
to a new column in the new df.我想要做的是在f2_col3
中分别搜索f1_col1
和f1_col2
的每个项目 - 如果项目存在,则保存它并将其相关行从(f2_col3)
添加到新 df 中的新列中。
f1:(two columns) f1:(两列)
f1_col1,f1_col2
kctd,Malat1
Gas5,Snhg6
f2:(three columns) f2:(三列)
f2_col1,f2_col2,f2_col3
chr7,snRNA,Gas5
chr1,protein_coding,Malat1
chr2,TEC,Snhg6
chr1,TEC,kctd
So based on the two files mentioned the desired output should be:因此,基于提到的两个文件,所需的 output 应该是:
new_df:新的_df:
f1_col1,f1_col2,f2_col1,f2_col1
kctd,Malat1,chr1,chr1
Gas5,Snhg6,chr7,chr2
note: f2_col2 is not important.注意: f2_col2 并不重要。
I do not have a strong programming background and found this very difficult - Even though I have checked multiple sources but have not been able to develop a solution - any help is appreciated.我没有很强的编程背景,发现这非常困难 - 即使我检查了多个来源但无法开发解决方案 - 感谢任何帮助。 Thanks谢谢
Based on 1 possible interpretation of your requirements and the 1 sunny-day example you provided where every key field always matches on every line, this MAY be what you're trying to do:基于对您的要求的 1 种可能解释和您提供的 1 个晴天示例,其中每个关键字段始终在每一行上匹配,这可能是您正在尝试做的事情:
$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR {
if ( FNR == 1 ) {
hdr = $1
}
map[$3] = $1
next
}
{ print $0, ( FNR>1 ? map[$1] OFS map[$2] : hdr OFS hdr ) }
$ awk -f tst.awk f2 f1
f1_col1,f1_col2,f2_col1,f2_col1
kctd,Malat1,chr1,chr1
Gas5,Snhg6,chr7,chr2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.