简体   繁体   English

如何根据列的匹配合并两个文件?

[英]How to merge two files based on the matching of columns?

I have two files out of which I want to create a third which contains all the information and each column is separated by tab. 我有两个文件,我要在其中创建第三个文件,其中包含所有信息,每一列都用制表符分隔。

file 1: 文件1:

67      rule_ref: _avc ,output_tag: 'hello'
2       rule_ref: _cdf ,output_tag: 'hi'
334     rule_ref: _xyz ,output_tag: 'bye'
1       rule_ref: _abc ,output_tag: 'go'

file 2: 文件2:

rule_ref: _avc ,output_tag: 'hello'     1
rule_ref: _cdf ,output_tag: 'hi'        4
rule_ref: _xyz ,output_tag: 'bye'    5

And would like a file3 such that: 并且想要一个file3这样的:

67    1    rule_ref: _avc ,output_tag: 'hello'
2     4    rule_ref: _cdf ,output_tag: 'hi'
334   5    rule_ref: _xyz ,output_tag: 'bye'
1     0    rule_ref: _abc ,output_tag: 'go'

2nd column of file1 match to 1st column of file2 and file3 contains 1st column from file1, 2nd column from file2 and 3rd column from file1. 文件1的第二列与文件2的第一列匹配,文件3包含文件1的第一列,文件2的第二列和文件1的第三列。

I search on google but don't find any result to solve this. 我在Google上搜索,但没有找到解决此问题的任何结果。 Please help 请帮忙

First, i assume you are using Pandas Dataframe , then u just need to use merge. 首先,我假设您正在使用Pandas Dataframe ,然后您只需要使用merge。

Try this: 尝试这个:

file1.merge(file2, on='Column with same values', left_on='lkey', right_on='rkey')

Doc: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html 文件: https//pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

It's really hard to tell, given the way that SO eats tabs, what your columns are. 考虑到SO食用制表符的方式,您很难说出列是什么。 Another character would make it a lot easier. 另一个字符将使其变得容易得多。

Based on your description, though, in the first file I think the numbers are one column and rule_ref: _avc ,output_tag: 'hello' etc. the second? 但是,根据您的描述,我认为在第一个文件中,数字是一列,而rule_ref: _avc ,output_tag: 'hello'等,第二个是? And similar for the second file? 和第二个文件相似吗? But you mention the third column of the first file, which doesn't exist with that scheme. 但是您提到了第一个文件的第三列,该文件不存在该方案。 Did you mean the second? 你是说第二个吗?

If so... 如果是这样的话...

$ join -t $'\t' -1 2 -2 1 -a 1 -e 0 -o '1.1 2.2 1.2' <(sort -t $'\t' -k 2 file1.txt) <(sort -t $'\t' -k 1 file2.txt)
1   0   rule_ref: _abc ,output_tag: 'go'
67  1   rule_ref: _avc ,output_tag: 'hello'
2   4   rule_ref: _cdf ,output_tag: 'hi'
334 5   rule_ref: _xyz ,output_tag: 'bye'

( join requires that the files it joins are sorted on the appropriate field, which your examples aren't, hence the sorting. Also requires a shell like bash that understands $'\\t' .) join要求将它加入的文件在适当的字段上排序,而您的示例不在此字段上,因此要进行排序。还需要像bash这样的shell才能理解$'\\t' 。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas:基于多列合并两个文件 - pandas: merge two files based on multiple columns 基于来自具有不同值的两列的字符串匹配合并来自多个熊猫系列数据帧的两列 - Merge two columns from multiple panda series dataframes based on string matching from two columns with different values 如何基于没有列顺序的两列合并 dataframe? - how to merge dataframe based on two columns without order of columns? dataframe 根据三个匹配列合并 - dataframe merge based on three matching columns 根据两列匹配来自两个csv文件的数据,并使用选定的列创建一个新的csv文件 - Matching data from two csv files based on two columns and creating a new csv file with selected columns 合并两个DataFrame和聚合匹配列 - Merge two DataFrames and aggregate matching columns 根据匹配索引合并两个数据帧以更新数据帧中的其他列 - Merge two dataframes based upon matching index to update other columns in the dataframe python:基于匹配两个数据集中的多个列合并两个数据库,并在结果上应用脚本 - python: merge two database based on matching multiple columns in both the datasets and apply a script on the result 如何比较 dataframe 中的两列并根据匹配字段更新列 - how to compare two columns in dataframe and update a column based on matching fields 根据列名合并两列 - Merge two columns based on the name of columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM