简体   繁体   English

PYTHON/PANDAS - 在多个索引上重新索引

[英]PYTHON/PANDAS - Reindexing on multiple indexes

I have a dataframe similar to what follows:我有一个类似于以下内容的数据框:

test = {"id": ["A", "A", "A", "B", "B", "B"],
        "date":    ["09-02-2013", "09-03-2013", "09-05-2013", "09-15-2013", "09-17-2013", "09-18-2013"],
        "country": ["Poland", "Poland", "France", "Scotland", "Scotland", "Canada"]}

and I want a table which returns this :我想要一个返回这个的表:

id ID date日期 country国家
A一个 09-02-2013 09-02-2013 Poland波兰
A一个 09-03-2013 09-03-2013 Poland波兰
A一个 09-04-2013 09-04-2013 Poland波兰
A一个 09-05-2013 09-05-2013 France法国
B 09-15-2013 09-15-2013 Scotland苏格兰
B 09-16-2013 09-16-2013 Scotland苏格兰
B 09-17-2013 09-17-2013 Scotland苏格兰
B 09-18-2013 09-18-2013 Canada加拿大

ie a table that fills in any date that I am missing but will only do it to the min/max of each id即填写我丢失的任何日期但只会填写每个ID的最小/最大值的表格

I have looked around stack overflow but usually this problem just has one index or the person wants to drop an index anyway This is what I have got so far:我已经查看了堆栈溢出,但通常这个问题只有一个索引,或者这个人无论如何都想删除一个索引这是我到目前为止所得到的:

test_df = pd.DataFrame(test)

# get min date per id
dates = test_df.groupby("id")["date"].min().to_frame(name="min")

# get max date
dates["max"] = test_df.groupby("id")["date"].max().to_frame(name="max")

midx = pd.MultiIndex.from_frame(dates.apply(lambda x: pd.date_range(x["min"], x["max"], freq="D"), axis=1).explode().reset_index(name="date")[["date", "id"]])

test_df = test_df.set_index(["date", "id"])

test_df = test_df.reindex(midx).fillna(method="ffill")

test_df

Which gets me really close but not quite there, with the dates all there but no country:这让我非常接近但并不完全在那里,日期都在那里但没有国家:

id ID date日期 country国家
A一个 09-02-2013 09-02-2013 NaN
A一个 09-03-2013 09-03-2013 NaN
A一个 09-04-2013 09-04-2013 NaN
A一个 09-05-2013 09-05-2013 NaN
B 09-15-2013 09-15-2013 NaN
B 09-16-2013 09-16-2013 NaN
B 09-17-2013 09-17-2013 NaN
B 09-18-2013 09-18-2013 NaN

Any ideas on how to fix it?关于如何解决它的任何想法?

IIUC, you could generate a date_range per group, explode , then merge and ffill the values per group: IIUC,您可以为每组生成一个date_rangeexplode ,然后mergeffill的值:

out = (test_df
       .merge(pd
             .to_datetime(test_df['date'], dayfirst=False)
             .groupby(test_df['id'])
             .apply(lambda g: pd.date_range(g.min(), g.max(), freq='D'))
             .explode().dt.strftime('%m-%d-%Y')
             .reset_index(name='date'),
             how='right'
            )
       .assign(country=lambda d: d.groupby('id')['country'].ffill())
      )

output:输出:

  id        date   country
0  A  09-02-2013    Poland
1  A  09-03-2013    Poland
2  A  09-04-2013    Poland
3  A  09-05-2013    France
4  B  09-15-2013  Scotland
5  B  09-16-2013  Scotland
6  B  09-17-2013  Scotland
7  B  09-18-2013    Canada

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM