简体   繁体   English

合并两个csv文件并使用python提取有用的信息

[英]Merging two csv files and extracting out useful information using python

I have two .csv files that look like following: 我有两个.csv文件,如下所示:

file_1: 文件_1:

id  a b c
10  1 2 3
11  2 3 4

file_2: 文件_2:

id   d e 
10   2 3
11   2 3
12   2 3

My expected output is: 我的预期输出是:

id  a b c d e
10  1 2 3 2 3
11  2 3 4 2 3

I wish to merge these two files by comparing the id number. 我希望通过比较ID号来合并这两个文件。 If the id number matched, the id and the corresponding rows need to be merged and extracted. 如果ID编号匹配,则ID和相应的行需要合并和提取。 If not matched, the corresponding id number's row is ignored. 如果不匹配,则将忽略相应ID号的行。 My code look like this: 我的代码如下所示:

import pandas as pd
s1=pd.read_csv("file_1.csv")
s2=pd.read_csv("file_2.csv")
if s1['id']==s2['id']:
    merged=s1.merge(s2, on="id", how="outer")
else:
    pass
merged
merged.to_csv("output.csv")

After running this coding, I cannot get my expected output. 运行此编码后,我无法获得预期的输出。 Anyone can help me? 有人可以帮助我吗? Thanks. 谢谢。

Try using pd.DataFrame.merge : 尝试使用pd.DataFrame.merge

print(file_1.merge(file_2, on='id'))

Output: 输出:

   a  b  c  id  d  e
0  1  2  3  10  2  3
1  2  3  4  11  2  3

If you care about the order of the columns do: 如果您关心列的顺序,请执行以下操作:

print(file_1.merge(file_2, on='id')[['id', 'a', 'b', 'c', 'd', 'e']])

Output: 输出:

   id  a  b  c  d  e
0  10  1  2  3  2  3
1  11  2  3  4  2  3

You are not merging the dataframes properly. 您没有正确合并数据框。 Try this: 尝试这个:

import pandas as pd
s1=pd.read_csv("file_1.csv")
s2=pd.read_csv("file_2.csv")

merged=s1.merge(s2, on="id")
# Set the index back to id
merged = merged.set_index("id")
merged.to_csv("output.csv")

pandas.join or pandas.merge should work here: pandas.joinpandas.merge应该在这里工作:

import pandas as pd
s1=pd.DataFrame({'id':[10, 11], 'a':[1,2], 'b':[2,3], 'c':[3,4]})
s2=pd.DataFrame({'id':[10, 11, 12], 'd':[2,2, 2], 'e':[3,3, 3]})
merged = s1.merge(s2, on='id', how='inner')
# join works as well
# merged = s1.join(s2.set_index('id'), on='id', how='inner')
merged

The output: 输出:

    id  a   b   c   d   e
0   10  1   2   3   2   3
1   11  2   3   4   2   3

As you didn't have mentioned weather your id is index name or a column name, so i took as index. 正如您没有提到的天气,您的ID是索引名或列名,因此我将其作为索引。

import pandas as pd
s1=pd.DataFrame({'id':[10, 11], 'a':[1,2], 'b':[2,3], 'c':[3,4]})
s2=pd.DataFrame({'id':[10, 11, 12], 'd':[2,2, 2], 'e':[3,3, 3]})
merg = pd.merge(left=s1,right=s2,on='id').set_index('id')
print merg

Here is your Output-- 这是您的输出-

    a  b  c  d  e
id               
10  1  2  3  2  3
11  2  3  4  2  3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM