简体   繁体   English

Python Pandas:合并返回 Nan

[英]Python Pandas: merge returns Nan

I have two dataframes ie df1 and df2.我有两个数据框,即 df1 和 df2。 df1 is df1 是

df1

       date     time
0   2015-04-01  00:00:00
1   2015-04-01  00:30:00
2   2015-04-01  01:00:00
3   2015-04-01  01:30:00
4   2015-04-01  02:00:00

Datatype of df1 is: df1 的数据类型是:

date             object
time    timedelta64[ns]
dtype: object

df2 is: df2 是:

     INCIDENT_TIME  INTERRUPTION_TIME      MINUTES
0   2015-01-08         03:00:00             1056.0
1   2015-01-10         23:30:00             3234.0
2   2015-04-01         01:00:00             3712.0
3   2015-04-01         01:30:00             3045.0
4   2015-04-01         02:00:00             525.0

datatype of df2 is 
INCIDENT_TIME                 object
INTERRUPTION_TIME    timedelta64[ns]
MINUTES               float64
dtype: object

I want to do left merge.我想做左合并。 So my code is:所以我的代码是:

final_df= pd.merge(df1,df2,left_on=['date','time'],right_on=['INCIDENT_TIME','INTERRUPTION_TIME'],how='left')

However, it does not produce the desired output.但是,它不会产生所需的输出。 The output was:输出是:

       date      time     INCIDENT_TIME   INTERRUPTION_TIME  CONSUM_MINUTES
0   2015-04-01  00:00:00    NaN               NaT                NaN
1   2015-04-01  00:30:00    NaN               NaT                NaN
2   2015-04-01  01:00:00    NaN               NaT                NaN
3   2015-04-01  01:30:00    NaN               NaT                NaN
4   2015-04-01  02:00:00    NaN               NaT                NaN 

For diagnostic purpose, I used inner join and the output was blank/null.出于诊断目的,我使用了内连接并且输出为空白/空。 Intially I thought the difference of datatypes might be causing the issue.最初我认为数据类型的差异可能会导致问题。 So I changed the datatype of time of df1 and INTERRUPTION_TIME of df2 to str.所以我将df1的时间数据类型和df2的INTERRUPTION_TIME的数据类型改为str。 Now datatype of both dataframes are :现在两个数据帧的数据类型是:

df1
date    object
time    object
dtype: object

df2
INCIDENT_TIME         object
INTERRUPTION_TIME     object
MINUTES               float64
dtype: object

When I ran the program again, it returned the same output.当我再次运行该程序时,它返回了相同的输出。 I am not sure where am I making the mistake.我不确定我在哪里犯了错误。 Could anyone help me in fixing the issue please?有人可以帮我解决这个问题吗?

i think you need to convert to datetime:我认为您需要转换为日期时间:

import pandas as _pd

df1['date'] = _pd.to_datetime(df1['date'])


print(df1.dtypes)

df2['INCIDENT_TIME'] = _pd.to_datetime(df2['INCIDENT_TIME'])
print(df2.dtypes)

final_df= _pd.merge(df1,df2,left_on=['date','time'],right_on=['INCIDENT_TIME','INTERRUPTION_TIME'],how='left')
print(final_df)

Which gives result:这给出了结果:

        date      time INCIDENT_TIME INTERRUPTION_TIME  MINUTES
0 2015-04-01  00:00:00           NaT               NaN      NaN
1 2015-04-01  00:30:00           NaT               NaN      NaN
2 2015-04-01  01:00:00    2015-04-01          01:00:00   3712.0
3 2015-04-01  01:30:00    2015-04-01          01:30:00   3045.0
4 2015-04-01  02:00:00    2015-04-01          02:00:00    525.0

I would ideeally create a full datetime column in order to make sure that the match corresponds for sure with date and time, which would look like this:我会理想地创建一个完整的日期时间列,以确保匹配肯定与日期和时间对应,如下所示:

import pandas as _pd


df1['datetime'] = _pd.to_datetime(df1['date']+ ' ' + df1['time'], format='%Y-%m-%d %H:%M:%S')

print(df1)


df2['incident_datetime'] = _pd.to_datetime(df2['INCIDENT_TIME']+ ' ' + df2['INTERRUPTION_TIME'], format='%Y-%m-%d %H:%M:%S')
final_df = _pd.merge(df1,df2,left_on=['datetime'],right_on=['incident_datetime'],how='left')

#dropping none matching columns
final_df = final_df.dropna()

print(final_df)

Which gives the following results:这给出了以下结果:


         date      time  ... MINUTES   incident_datetime
2  2015-04-01  01:00:00  ...  3712.0 2015-04-01 01:00:00
3  2015-04-01  01:30:00  ...  3045.0 2015-04-01 01:30:00
4  2015-04-01  02:00:00  ...   525.0 2015-04-01 02:00:00

great reference for datetime convertion (which is strptime not strftime) with panda: https://www.journaldev.com/23365/python-string-to-datetime-strptime使用熊猫进行日期时间转换(strptime 不是 strftime)的重要参考: https ://www.journaldev.com/23365/python-string-to-datetime-strptime

The data may contain white-space or other characters, you can try calling the strip function on every cell this might solve the issues.数据可能包含空格或其他字符,您可以尝试在每个单元格上调用 strip 函数,这可能会解决问题。

Use datetime64[ns] for date columns for better results.对日期列使用 datetime64[ns] 以获得更好的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM