[英]how to merge Two datasets with different time ranges?
I have two datasets that look like this:我有两个如下所示的数据集:
df1
: df1
:
Date![]() |
City![]() |
State![]() |
Quantity![]() |
---|---|---|---|
2019-01 ![]() |
Chicago![]() |
IL![]() |
35 ![]() |
2019-01 ![]() |
Orlando![]() |
FL![]() |
322 ![]() |
... ![]() |
.... ![]() |
... ![]() |
... ![]() |
2021-07 ![]() |
Chicago![]() |
IL![]() |
334 ![]() |
2021-07 ![]() |
Orlando![]() |
FL![]() |
4332 ![]() |
df2
: df2
:
Date![]() |
City![]() |
State![]() |
Sales![]() |
---|---|---|---|
2020-03 ![]() |
Chicago![]() |
IL![]() |
30 ![]() |
2020-03 ![]() |
Orlando![]() |
FL![]() |
319 ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
2021-07 ![]() |
Chicago![]() |
IL![]() |
331 ![]() |
2021-07 ![]() |
Orlando![]() |
FL![]() |
4000 ![]() |
My date
is in format period[M]
for both datasets.我的
date
在两个数据集的格式period[M]
中。 I have tried using the df1.join(df2,how='outer')
and (df2.join(df1,how='outer')
commands but they don't add up correctly, essentially, in 2019-01
, I have sales for 2020-03
. How can I join these two datasets such that my output is as follows:我曾尝试使用
df1.join(df2,how='outer')
和(df2.join(df1,how='outer')
命令,但它们没有正确加起来,基本上,在2019-01
中,我有sales for 2020-03
. 我怎样才能加入这两个数据集,这样我的输出如下:
I have not been able to use merge()
because I would have to merge with a combination of City
and State
and Date
我无法使用
merge()
因为我必须与City
和State
以及Date
的组合合并
Date![]() |
City![]() |
State![]() |
Quantity![]() |
Sales![]() |
---|---|---|---|---|
2019-01 ![]() |
Chicago![]() |
IL![]() |
35 ![]() |
NaN![]() |
2019-01 ![]() |
Orlando![]() |
FL![]() |
322 ![]() |
NaN![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
2021-07 ![]() |
Chicago![]() |
IL![]() |
334 ![]() |
331 ![]() |
2021-07 ![]() |
Orlando![]() |
FL![]() |
4332 ![]() |
4000 ![]() |
You can outer-merge
.您可以
outer-merge
。 By not specifying the columns to merge on, you merge on the intersection of the columns in both DataFrames (in this case, Date
, City
and State
).通过不指定要合并的列,您可以合并两个 DataFrame 中列的交集(在本例中为
Date
、 City
和State
)。
out = df1.merge(df2, how='outer').sort_values(by='Date')
Output:输出:
Date City State Quantity Sales
0 2019-01 Chicago IL 35.0 NaN
1 2019-01 Orlando FL 322.0 NaN
4 2020-03 Chicago IL NaN 30.0
5 2020-03 Orlando FL NaN 319.0
2 2021-07 Chicago IL 334.0 331.0
3 2021-07 Orlando FL 4332.0 4000.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.