![](/img/trans.png)
[英]Python Pandas - KeyError on pd.merge even when same columns exist across dataframes
[英]pd.merge : trying to merge Dataframes with same columns names
我知道这是一个简单的问题,但我已经陷入其中一段时间了。 我有两个DataFrame
,它们有数千个 os 行,但这里有一个示例:
df1 =
Name Value Date
x 0.04 2014-01-02
x 0.03 2014-01-03
x 0.02 2014-01_05
x 0.02 2014-01-07
(...) (...) (...)
y 0.002 2014-01-01
y 0.001 2014-01-02
y 0.003 2014-01-03
y 0.004 2014-01-07
(...) (...) (...)
z 0.003 2014-01-02
z 0.003 2014-01-05
z 0.004 2014-01-07
(...) (...) (...)
另一个数据Dataframe
:
df2 =
Name Value Date
x 0.04 2015-01-02
x 0.03 2015-01-03
x 0.02 2015-01_05
x 0.02 2015-01-07
(...) (...) (...)
y 0.002 2015-01-01
y 0.001 2015-01-02
y 0.003 2015-01-03
y 0.004 2015-01-07
(...) (...) (...)
z 0.003 2015-01-02
z 0.003 2015-01-05
z 0.004 2015-01-07
(...) (...) (...)
我想要的是 :
df3=
Name Value Date
x 0.04 2014-01-02
x 0.03 2014-01-03
x 0.02 2014-01_05
x 0.02 2014-01-07
x 0.04 2015-01-02
x 0.03 2015-01-03
x 0.02 2015-01_05
x 0.02 2015-01-07
(...) (...) (...)
y 0.002 2014-01-01
y 0.001 2014-01-02
y 0.003 2014-01-03
y 0.004 2014-01-07
y 0.002 2015-01-01
y 0.001 2015-01-02
y 0.003 2015-01-03
y 0.004 2015-01-07
(...) (...) (...)
z 0.003 2014-01-02
z 0.003 2014-01-05
z 0.004 2014-01-07
z 0.003 2015-01-02
z 0.003 2015-01-05
z 0.004 2015-01-07
(...) (...) (...)
1)当我合并时,如果"name"
不存在于 2014 年的数据中,我希望它不存在于我的 df3 中,并且与我的 2015 年数据相同。
换句话说,我只想要在我的Dataframes
中都有价值的"Name"
。
我试过的:
a= df1.merge(df2,how="inner")
并且
frames= [df1,df2]
df3= pd.concat([frames],axis=1)
但是我得到的输出是:
df3 =
Value_x Date_y Name Value_y Date_y
0.03 2014-01-02 x 0.04 2015-01-02
0.02 2014-01-05 x 0.03 2015-01-03
0.03 2014-01-06 x 0.02 2015-01_05
0.03 2014-01-07 x 0.02 2015-01-07
(...) (...) (...) (...) (...)
0.02 2014-01-03 y 0.002 2015-01-01
0.01 2014-01-07 y 0.001 2015-01-02
0.02 2014-01-06 y 0.003 2015-01-03
00.2 2014-01-07 y 0.004 2015-01-07
(...) (...) (...) (...) (...)
0.03 2014-01-02 z 0.003 2015-01-02
0.01 2014-01-04 z 0.003 2015-01-05
0.03 2014-01-05 z 0.004 2015-01-07
(...) (...) (...) (...) (...)
使用pd.append
:你可以做
#...
df = df1.append(df2, ignore_index=True)
# or more dfs list
df = df1.append([df2, df3], ignore_index=True)
有关更多信息,请参阅文档https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html
你能试一下吗
df3 = pd.merge(df1, df2, left_on='Value', right_on='Value')
如果我理解正确的话,你想如果有一天是2014年或2015年失踪,从2014年和2015年比赛的日子,它不应该在结果框架。
请注意,在此示例中,我将2014-01-08
日期作为名称z
添加到df1
- 它不会在最终数据框中,因为df2
中不存在具有此名称的2015-01-08
):
import pandas as pd
name_1 = ['x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'z', 'z', 'z', 'z']
value_1 = [0.04, 0.03, 0.02, 0.02, 0.002, 0.001, 0.003, 0.004, 0.003, 0.003, 0.004, 0.009]
date_1 = ['2014-01-02', '2014-01-03', '2014-01-05', '2014-01-07', '2014-01-01', '2014-01-02', '2014-01-03', '2014-01-07', '2014-01-02', '2014-01-05', '2014-01-07', '2014-01-08']
name_2 = ['x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'z', 'z', 'z']
value_2 = [0.04, 0.03, 0.02, 0.02, 0.002, 0.001, 0.003, 0.004, 0.003, 0.003, 0.004]
date_2 = ['2015-01-02', '2015-01-03', '2015-01-05', '2015-01-07', '2015-01-01', '2015-01-02', '2015-01-03', '2015-01-07', '2015-01-02', '2015-01-05', '2015-01-07']
df1 = pd.DataFrame({'Name':name_1, 'Value':value_1, 'Date': date_1})
df2 = pd.DataFrame({'Name':name_2, 'Value':value_2, 'Date': date_2})
df1['days'] = df1['Date'].str.split(r'\d{4}-(\d+-\d+)', expand=True)[1]
df2['days'] = df2['Date'].str.split(r'\d{4}-(\d+-\d+)', expand=True)[1]
df = pd.merge( df1, df2, on=['Name', 'days'] )
df = df1[ df1.set_index( ['Name', 'Date'] ).index.isin( df.set_index( ['Name', 'Date_x']).index ) ].append(
df2[ df2.set_index( ['Name', 'Date'] ).index.isin( df.set_index( ['Name', 'Date_y']).index ) ]
).sort_values(['Name', 'Date']).reset_index(drop=True)
del df['days']
print(df)
印刷:
Name Value Date
0 x 0.040 2014-01-02
1 x 0.030 2014-01-03
2 x 0.020 2014-01-05
3 x 0.020 2014-01-07
4 x 0.040 2015-01-02
5 x 0.030 2015-01-03
6 x 0.020 2015-01-05
7 x 0.020 2015-01-07
8 y 0.002 2014-01-01
9 y 0.001 2014-01-02
10 y 0.003 2014-01-03
11 y 0.004 2014-01-07
12 y 0.002 2015-01-01
13 y 0.001 2015-01-02
14 y 0.003 2015-01-03
15 y 0.004 2015-01-07
16 z 0.003 2014-01-02
17 z 0.003 2014-01-05
18 z 0.004 2014-01-07
19 z 0.003 2015-01-02
20 z 0.003 2015-01-05
21 z 0.004 2015-01-07
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.