简体   繁体   English

python 2 vs python 3 pd.merge命令

[英]python 2 vs python 3 pd.merge command

The following command works fine if I run python 2 如果我运行python 2,以下命令可以正常工作

df5b = pd.merge(df5a, df5bb, how='outer')

However, when I run the same command with the same dfs in python 3, I get the following error: 但是,当我在python 3中使用相同的dfs运行相同的命令时,出现以下错误:

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

My dataframes are very large, I hope that someone can help me out, without giving examples of my dataframes. 我的数据框很大,我希望有人能在没有给出我的数据框示例的情况下帮助我。 The command is ok with python 2, so I assume that the problem are not the dataframes, but maybe a change of this command in python 3? 该命令在python 2上还可以,所以我认为问题不在于数据帧,而是在python 3中可以更改此命令吗?

There is problem some columns are integers in one DataFrame and strings in another with same names. 有一个问题,有些列是一个DataFrame中的整数,而另一列是相同名称的字符串。

Simpliest solution is cast all columns to strings: 最简单的解决方案是将所有列都转换为字符串:

df5b = pd.merge(df5a.astype(str), df5bb.astype(str), how='outer')

Another is test dtypes: 另一个是测试dtype:

print (df5a.dtypes)
print (df5bb.dtypes)

And convert columns for same, eg convert strings columns from list to integers: 并转换相同的列,例如将字符串列从列表转换为整数:

cols = ['col1','col12','col3']
df5a[cols] = df5a[cols].astype(int)

Sample : 样品

df5a = pd.DataFrame({
         'B':[4,5,4,5],
         'C':[7,8,9,4],
         'F':list('aaab')
})

df5bb = pd.DataFrame({
         'B':['4','5','5'],
         'F':list('aab')
})

df5b = pd.merge(df5a.astype(str), df5bb.astype(str), how='outer')
print (df5b)

   B  C  F
0  4  7  a
1  4  9  a
2  5  8  a
3  5  4  b

print (df5a.dtypes)
B     int64
C     int64
F    object
dtype: object

print (df5bb.dtypes)
B    object
F    object
dtype: object

cols = ['B']
df5bb[cols] = df5bb[cols].astype(int)

df5b = pd.merge(df5a, df5bb, how='outer')
print (df5b)

   B  C  F
0  4  7  a
1  4  9  a
2  5  8  a
3  5  4  b

as i stated in my comment section, coerce is not happening on mixed types(may be int , str or float ) hence you can consider concat or you can convert them as str and then merge which jezrael mentioned. 正如我在评论部分中所述,在混合类型(可能是intstrfloat )上不会发生强制转换,因此您可以考虑concat或将它们转换为str ,然后合并jezrael提到的内容。

Just to determine types you can see .. 只是确定类型,您可以看到..

>>> pd.concat([df5a, df5bb]).dtypes
B     object
C    float64
F     object
dtype: object

>>> pd.concat([df5a, df5bb])
   B    C  F
0  4  7.0  a
1  5  8.0  a
2  4  9.0  a
3  5  4.0  b
0  4  NaN  a
1  5  NaN  a
2  5  NaN  b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM