[英]pandas: problems with merging dataframes
我正在尝试在on=SICcode
合并以下两个数据on=SICcode
:
df.head(5)
SICcode Catcode Category SICname MultSIC
0 111 A1500 Wheat, corn, soybeans and cash grain Wheat X
1 112 A1600 Other commodities (incl rice, peanuts) Rice X
2 115 A1500 Wheat, corn, soybeans and cash grain Corn X
3 116 A1500 Wheat, corn, soybeans and cash grain Soybeans X
4 119 A1500 Wheat, corn, soybeans and cash grain Cash grains X
df.columns.tolist()
['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']
merged.head()
2012 NAICS Code 2002to2007 NAICS SICcode
0 111110 111110 116
1 111120 111120 119
2 111130 111130 119
3 111140 111140 111
4 111150 111150 115
merged.columns.tolist()
['2012 NAICS Code', '2002to2007 NAICS', 'SICcode']
当我尝试将它们与以下代码合并时:
merged=pd.merge(merged,df, how='left', on='SICcode')
我得到一个Keyerror: 'SICcode'
我试图设置dtype
的一对DFS的,但如果我这样做,我收到一个Keycode error
。
如果有人对此有任何想法或需要更多信息,请告诉我。
在第一列注意:
In [27]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0)
In [28]: df.columns.tolist()
Out[28]: ['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']
In [29]: df['SICcode']
...
KeyError: 'SICcode'
In [30]: df['\ufeffSICcode'].head()
Out[30]:
0 111
1 112
2 115
3 116
4 119
Name: SICcode, dtype: int64
就像@unutbu在他的评论中所说,在pd.read_csv()
调用中添加encoding='utf-8_sig'
可能有助于您解决此问题:
In [31]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0, encoding='utf-8_sig')
In [32]: df.columns.tolist()
Out[32]: ['SICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.