[英]How to merge two files based on the matching of columns?
I have two files out of which I want to create a third which contains all the information and each column is separated by tab. 我有两个文件,我要在其中创建第三个文件,其中包含所有信息,每一列都用制表符分隔。
67 rule_ref: _avc ,output_tag: 'hello'
2 rule_ref: _cdf ,output_tag: 'hi'
334 rule_ref: _xyz ,output_tag: 'bye'
1 rule_ref: _abc ,output_tag: 'go'
rule_ref: _avc ,output_tag: 'hello' 1
rule_ref: _cdf ,output_tag: 'hi' 4
rule_ref: _xyz ,output_tag: 'bye' 5
And would like a file3 such that: 并且想要一个file3这样的:
67 1 rule_ref: _avc ,output_tag: 'hello'
2 4 rule_ref: _cdf ,output_tag: 'hi'
334 5 rule_ref: _xyz ,output_tag: 'bye'
1 0 rule_ref: _abc ,output_tag: 'go'
2nd column of file1 match to 1st column of file2 and file3 contains 1st column from file1, 2nd column from file2 and 3rd column from file1. 文件1的第二列与文件2的第一列匹配,文件3包含文件1的第一列,文件2的第二列和文件1的第三列。
I search on google but don't find any result to solve this. 我在Google上搜索,但没有找到解决此问题的任何结果。 Please help 请帮忙
First, i assume you are using Pandas Dataframe , then u just need to use merge. 首先,我假设您正在使用Pandas Dataframe ,然后您只需要使用merge。
Try this: 尝试这个:
file1.merge(file2, on='Column with same values', left_on='lkey', right_on='rkey')
Doc: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html 文件: https : //pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
It's really hard to tell, given the way that SO eats tabs, what your columns are. 考虑到SO食用制表符的方式,您很难说出列是什么。 Another character would make it a lot easier. 另一个字符将使其变得容易得多。
Based on your description, though, in the first file I think the numbers are one column and rule_ref: _avc ,output_tag: 'hello'
etc. the second? 但是,根据您的描述,我认为在第一个文件中,数字是一列,而rule_ref: _avc ,output_tag: 'hello'
等,第二个是? And similar for the second file? 和第二个文件相似吗? But you mention the third column of the first file, which doesn't exist with that scheme. 但是您提到了第一个文件的第三列,该文件不存在该方案。 Did you mean the second? 你是说第二个吗?
If so... 如果是这样的话...
$ join -t $'\t' -1 2 -2 1 -a 1 -e 0 -o '1.1 2.2 1.2' <(sort -t $'\t' -k 2 file1.txt) <(sort -t $'\t' -k 1 file2.txt)
1 0 rule_ref: _abc ,output_tag: 'go'
67 1 rule_ref: _avc ,output_tag: 'hello'
2 4 rule_ref: _cdf ,output_tag: 'hi'
334 5 rule_ref: _xyz ,output_tag: 'bye'
( join
requires that the files it joins are sorted on the appropriate field, which your examples aren't, hence the sorting. Also requires a shell like bash that understands $'\\t'
.) ( join
要求将它加入的文件在适当的字段上排序,而您的示例不在此字段上,因此要进行排序。还需要像bash这样的shell才能理解$'\\t'
。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.