PYTHON/PANDAS - 在多个索引上重新索引

Question

I have a dataframe similar to what follows:我有一个类似于以下内容的数据框：

test = {"id": ["A", "A", "A", "B", "B", "B"],
        "date":    ["09-02-2013", "09-03-2013", "09-05-2013", "09-15-2013", "09-17-2013", "09-18-2013"],
        "country": ["Poland", "Poland", "France", "Scotland", "Scotland", "Canada"]}

and I want a table which returns this :我想要一个返回这个的表：

id ID	date日期	country国家
A一个	09-02-2013 09-02-2013	Poland波兰
A一个	09-03-2013 09-03-2013	Poland波兰
A一个	09-04-2013 09-04-2013	Poland波兰
A一个	09-05-2013 09-05-2013	France法国
B乙	09-15-2013 09-15-2013	Scotland苏格兰
B乙	09-16-2013 09-16-2013	Scotland苏格兰
B乙	09-17-2013 09-17-2013	Scotland苏格兰
B乙	09-18-2013 09-18-2013	Canada加拿大

ie a table that fills in any date that I am missing but will only do it to the min/max of each id即填写我丢失的任何日期但只会填写每个ID的最小/最大值的表格

I have looked around stack overflow but usually this problem just has one index or the person wants to drop an index anyway This is what I have got so far:我已经查看了堆栈溢出，但通常这个问题只有一个索引，或者这个人无论如何都想删除一个索引这是我到目前为止所得到的：

test_df = pd.DataFrame(test)

# get min date per id
dates = test_df.groupby("id")["date"].min().to_frame(name="min")

# get max date
dates["max"] = test_df.groupby("id")["date"].max().to_frame(name="max")

midx = pd.MultiIndex.from_frame(dates.apply(lambda x: pd.date_range(x["min"], x["max"], freq="D"), axis=1).explode().reset_index(name="date")[["date", "id"]])

test_df = test_df.set_index(["date", "id"])

test_df = test_df.reindex(midx).fillna(method="ffill")

test_df

Which gets me really close but not quite there, with the dates all there but no country:这让我非常接近但并不完全在那里，日期都在那里但没有国家：

id ID	date日期	country国家
A一个	09-02-2013 09-02-2013	NaN钠
A一个	09-03-2013 09-03-2013	NaN钠
A一个	09-04-2013 09-04-2013	NaN钠
A一个	09-05-2013 09-05-2013	NaN钠
B乙	09-15-2013 09-15-2013	NaN钠
B乙	09-16-2013 09-16-2013	NaN钠
B乙	09-17-2013 09-17-2013	NaN钠
B乙	09-18-2013 09-18-2013	NaN钠

Any ideas on how to fix it?关于如何解决它的任何想法？

Answer 1

IIUC, you could generate a date_range per group, explode , then merge and ffill the values per group: IIUC，您可以为每组生成一个date_range ， explode ，然后merge并ffill的值：

out = (test_df
       .merge(pd
             .to_datetime(test_df['date'], dayfirst=False)
             .groupby(test_df['id'])
             .apply(lambda g: pd.date_range(g.min(), g.max(), freq='D'))
             .explode().dt.strftime('%m-%d-%Y')
             .reset_index(name='date'),
             how='right'
            )
       .assign(country=lambda d: d.groupby('id')['country'].ffill())
      )

output:输出：

  id        date   country
0  A  09-02-2013    Poland
1  A  09-03-2013    Poland
2  A  09-04-2013    Poland
3  A  09-05-2013    France
4  B  09-15-2013  Scotland
5  B  09-16-2013  Scotland
6  B  09-17-2013  Scotland
7  B  09-18-2013    Canada

PYTHON/PANDAS - 在多个索引上重新索引

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-06-20 15:22:55

PYTHON/PANDAS - 在多个索引上重新索引

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-06-20 15:22:55

解决方案1
2 已采纳 2022-06-20 15:22:55