简体   繁体   English

无法比较两个数据帧

[英]Not able to compare two dataframes

I have two Dataframes, df1 and df2 with same structure.我有两个具有相同结构的数据框 df1 和 df2。 I want to find common rows between them using df1.merge(df2) but there is one row I am facing issue with:我想使用df1.merge(df2)找到它们之间的共同行,但有一行我面临问题:

>>> df2
  reference_period analyzed_domain account is_misc total_estimated_visits  total_estimated_monthly_unique_visitors  total_estimated_visit_duration  total_estimated_pageviews  estimated_deduplicated_audience
0       2017-11-01          abc     xyz       0                   1000                                    278.0                          5788.0                    80159.0                              0.0
>>> df1=df1.head(1)
>>> df1
  reference_period analyzed_domain account is_misc total_estimated_visits  total_estimated_monthly_unique_visitors  total_estimated_visit_duration  total_estimated_pageviews  estimated_deduplicated_audience
0       2017-11-01          abc     xyz       0                   1000                                    278.0                          5788.0                    80159.0                              0.0
>>> df1==df2
   reference_period  analyzed_domain  account  is_misc  total_estimated_visits  total_estimated_monthly_unique_visitors  total_estimated_visit_duration  total_estimated_pageviews  estimated_deduplicated_audience
0              True             True     True    False                    True                                     True                            True                       True                             True
>>> df1.dtypes
reference_period                           datetime64[ns]
analyzed_domain                                    object
account                                            object
is_misc                                            object
total_estimated_visits                             object
total_estimated_monthly_unique_visitors           float64
total_estimated_visit_duration                    float64
total_estimated_pageviews                         float64
estimated_deduplicated_audience                   float64
dtype: object
>>> df2.dtypes
reference_period                           datetime64[ns]
analyzed_domain                                    object
account                                            object
is_misc                                            object
total_estimated_visits                             object
total_estimated_monthly_unique_visitors           float64
total_estimated_visit_duration                    float64
total_estimated_pageviews                         float64
estimated_deduplicated_audience                   float64
dtype: object

I am not sure why python is not able to equate is_misc column.我不确定为什么 python 不能等同于 is_misc 列。 Could someone please help?有人可以帮忙吗? Thanks谢谢

Pandas dtype object is either str or mixed. Pandas dtype object 是 str 或混合。 So it can be either text or mixed numeric and non-numeric values.所以它可以是文本或混合的数字和非数字值。 In either df1 or df2 , the 0 value for the is_misc column is most likely string type, so you can convert both of them to either string or int, and then run the comparison again, which will then equal True .df1df2中, is_misc列的0值很可能是字符串类型,因此您可以将它们都转换为字符串或 int,然后再次运行比较,结果将等于True try this:尝试这个:

df1['is_misc'] = df1['is_misc'].astype(str).astype(int)
df2['is_misc'] = df2['is_misc'].astype(str).astype(int)

And then compare again:然后再次比较:

print(df1 == df2)

Gustav Rasmussen ans will work Gustav Rasmussen ans 将工作

i got the same problem but i had a string with decimals (eg '5.0') in first dataframe and integer in 2nd dataframe ( eg 5)我遇到了同样的问题,但我在第一个 dataframe 和 integer 中有一个带小数的字符串(例如'5.0')在第二个 Z6A8064B5DF479455500553C47C55057D 中(例如)

i solved the follwing way我解决了以下方式

df1['column'] = df1['column'].astype(float).astype(int)

df2['column'] = df2['column'].astype(float).astype(int)

and compare并比较

df1==df2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM