简体   繁体   English

大熊猫:合并数据框的问题

[英]pandas: problems with merging dataframes

I'm trying to merge the two following dataframes on=SICcode : 我正在尝试在on=SICcode合并以下两个数据on=SICcode

df.head(5)

    SICcode     Catcode     Category                            SICname     MultSIC
0   111         A1500   Wheat, corn, soybeans and cash grain    Wheat        X
1   112         A1600   Other commodities (incl rice, peanuts)  Rice         X
2   115         A1500   Wheat, corn, soybeans and cash grain    Corn         X
3   116         A1500   Wheat, corn, soybeans and cash grain    Soybeans     X
4   119         A1500   Wheat, corn, soybeans and cash grain    Cash grains  X

df.columns.tolist()

['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']  

merged.head()


2012 NAICS Code     2002to2007 NAICS    SICcode
0   111110          111110               116
1   111120          111120               119
2   111130          111130               119
3   111140          111140               111
4   111150          111150               115

 merged.columns.tolist()
['2012 NAICS Code', '2002to2007 NAICS', 'SICcode']

When I try to merge them with the following code: 当我尝试将它们与以下代码合并时:

merged=pd.merge(merged,df, how='left', on='SICcode')    

I get a Keyerror: 'SICcode' I tried to set the dtype of One of the dfs but When I do, I receive a Keycode error . 我得到一个Keyerror: 'SICcode'我试图设置dtype对DFS的,但如果我这样做,我收到一个Keycode error

If anyone has an idea on this or would request more information please let me know. 如果有人对此有任何想法或需要更多信息,请告诉我。

pay attention at the first column: 在第一列注意:

In [27]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0)

In [28]: df.columns.tolist()
Out[28]: ['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']

In [29]: df['SICcode']

...

KeyError: 'SICcode'

In [30]: df['\ufeffSICcode'].head()
Out[30]:
0    111
1    112
2    115
3    116
4    119
Name: SICcode, dtype: int64

as @unutbu has said in his comment, adding encoding='utf-8_sig' to the pd.read_csv() call might help you to fix this problem: 就像@unutbu在他的评论中所说,在pd.read_csv()调用中添加encoding='utf-8_sig'可能有助于您解决此问题:

In [31]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0, encoding='utf-8_sig')

In [32]: df.columns.tolist()
Out[32]: ['SICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM