[英]How to combine two dataframes by index (pandas)
I have two dataframes, with the same date
field, but different other fields.我有两个数据框,具有相同的
date
字段,但其他字段不同。 I need to add a column pneumonia_ARVI
from dataframe pneumonia_ARVI
to dataframe Result_data
.我需要从数据
Result_data
pneumonia_ARVI
Result_data
到数据Result_data
添加一列pneumonia_ARVI
Result_data
。
They initially differ in the number of dates, in Result_data
dataframe there are significantly more dates than in pneumonia_ARVI
它们最初的日期数量不同,在
Result_data
数据Result_data
,日期明显多于pneumonia_ARVI
Result_data
I need a concatenation with a date match, but if the records in the dataframe pneumonia_ARVI
than in the dataframe Result_data
, then the preference would have the dates specified in the dataset Result_data
.我需要一个日期匹配的连接,但是如果数据帧
pneumonia_ARVI
Result_data
的记录比数据帧Result_data
的记录多,那么首选项将具有数据集Result_data
指定的日期。 And the data that is missing in the dataset pneumonia_ARVI
replaced with empty values.并将数据集
pneumonia_ARVI
缺失的数据替换为空值。
I have tried doing我试过做
Result_data = Result_data.set_index('date')
pneumonia_ARVI = pneumonia_ARVI.set_index('date')
End = pd.merge(Result_data, pneumonia_ARVI, left_index=True, right_index=True)
But this led to the fact that the data was adjusted to each other, and the field infected_city
do not leave all their original values by date.但这导致数据相互调整,并且字段
infected_city
并没有按日期保留所有原始值。
How to combine this data correctly so that there are no problems with reducing the total number of dates?如何正确组合这些数据,以便减少日期总数没有问题?
#convert to datetime if needed
Result_data["date"] = pd.to_datetime(Result_data["date"])
pneumonia_ARVI["date"] = pd.to_datetime(pneumonia_ARVI["date"])
#set index as you have done
Result_data = Result_data.set_index('date')
pneumonia_ARVI = pneumonia_ARVI.set_index('date')
#perform a left join
End = Result_data.join(pneumonia_ARVI, how="left")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.