[英]Python Pandas: merge returns Nan
I have two dataframes ie df1 and df2.我有两个数据框,即 df1 和 df2。 df1 is
df1 是
df1
date time
0 2015-04-01 00:00:00
1 2015-04-01 00:30:00
2 2015-04-01 01:00:00
3 2015-04-01 01:30:00
4 2015-04-01 02:00:00
Datatype of df1 is: df1 的数据类型是:
date object
time timedelta64[ns]
dtype: object
df2 is: df2 是:
INCIDENT_TIME INTERRUPTION_TIME MINUTES
0 2015-01-08 03:00:00 1056.0
1 2015-01-10 23:30:00 3234.0
2 2015-04-01 01:00:00 3712.0
3 2015-04-01 01:30:00 3045.0
4 2015-04-01 02:00:00 525.0
datatype of df2 is
INCIDENT_TIME object
INTERRUPTION_TIME timedelta64[ns]
MINUTES float64
dtype: object
I want to do left merge.我想做左合并。 So my code is:
所以我的代码是:
final_df= pd.merge(df1,df2,left_on=['date','time'],right_on=['INCIDENT_TIME','INTERRUPTION_TIME'],how='left')
However, it does not produce the desired output.但是,它不会产生所需的输出。 The output was:
输出是:
date time INCIDENT_TIME INTERRUPTION_TIME CONSUM_MINUTES
0 2015-04-01 00:00:00 NaN NaT NaN
1 2015-04-01 00:30:00 NaN NaT NaN
2 2015-04-01 01:00:00 NaN NaT NaN
3 2015-04-01 01:30:00 NaN NaT NaN
4 2015-04-01 02:00:00 NaN NaT NaN
For diagnostic purpose, I used inner join and the output was blank/null.出于诊断目的,我使用了内连接并且输出为空白/空。 Intially I thought the difference of datatypes might be causing the issue.
最初我认为数据类型的差异可能会导致问题。 So I changed the datatype of time of df1 and INTERRUPTION_TIME of df2 to str.
所以我将df1的时间数据类型和df2的INTERRUPTION_TIME的数据类型改为str。 Now datatype of both dataframes are :
现在两个数据帧的数据类型是:
df1
date object
time object
dtype: object
df2
INCIDENT_TIME object
INTERRUPTION_TIME object
MINUTES float64
dtype: object
When I ran the program again, it returned the same output.当我再次运行该程序时,它返回了相同的输出。 I am not sure where am I making the mistake.
我不确定我在哪里犯了错误。 Could anyone help me in fixing the issue please?
有人可以帮我解决这个问题吗?
i think you need to convert to datetime:我认为您需要转换为日期时间:
import pandas as _pd
df1['date'] = _pd.to_datetime(df1['date'])
print(df1.dtypes)
df2['INCIDENT_TIME'] = _pd.to_datetime(df2['INCIDENT_TIME'])
print(df2.dtypes)
final_df= _pd.merge(df1,df2,left_on=['date','time'],right_on=['INCIDENT_TIME','INTERRUPTION_TIME'],how='left')
print(final_df)
Which gives result:这给出了结果:
date time INCIDENT_TIME INTERRUPTION_TIME MINUTES
0 2015-04-01 00:00:00 NaT NaN NaN
1 2015-04-01 00:30:00 NaT NaN NaN
2 2015-04-01 01:00:00 2015-04-01 01:00:00 3712.0
3 2015-04-01 01:30:00 2015-04-01 01:30:00 3045.0
4 2015-04-01 02:00:00 2015-04-01 02:00:00 525.0
I would ideeally create a full datetime column in order to make sure that the match corresponds for sure with date and time, which would look like this:我会理想地创建一个完整的日期时间列,以确保匹配肯定与日期和时间对应,如下所示:
import pandas as _pd
df1['datetime'] = _pd.to_datetime(df1['date']+ ' ' + df1['time'], format='%Y-%m-%d %H:%M:%S')
print(df1)
df2['incident_datetime'] = _pd.to_datetime(df2['INCIDENT_TIME']+ ' ' + df2['INTERRUPTION_TIME'], format='%Y-%m-%d %H:%M:%S')
final_df = _pd.merge(df1,df2,left_on=['datetime'],right_on=['incident_datetime'],how='left')
#dropping none matching columns
final_df = final_df.dropna()
print(final_df)
Which gives the following results:这给出了以下结果:
date time ... MINUTES incident_datetime
2 2015-04-01 01:00:00 ... 3712.0 2015-04-01 01:00:00
3 2015-04-01 01:30:00 ... 3045.0 2015-04-01 01:30:00
4 2015-04-01 02:00:00 ... 525.0 2015-04-01 02:00:00
great reference for datetime convertion (which is strptime not strftime) with panda: https://www.journaldev.com/23365/python-string-to-datetime-strptime使用熊猫进行日期时间转换(strptime 不是 strftime)的重要参考: https ://www.journaldev.com/23365/python-string-to-datetime-strptime
The data may contain white-space or other characters, you can try calling the strip function on every cell this might solve the issues.数据可能包含空格或其他字符,您可以尝试在每个单元格上调用 strip 函数,这可能会解决问题。
Use datetime64[ns] for date columns for better results.对日期列使用 datetime64[ns] 以获得更好的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.