[英]Forward-fill dates within MultiIndexed DataFrame
I have a PANDAs DataFrame with a MultiIndex, where one of the levels represents a year: 我有一个带有MultiIndex的PANDAs DataFrame,其中一个级别代表一年:
import pandas as pd
df = pd.DataFrame(dict(A = ['foo', 'foo', 'bar', 'bar', 'bar', 'bar'],
B = ['white', 'black', 'white', 'white', 'black', 'black'],
year = [1990, 1992, 1990, 1992, 1991, 1992],
value = [3.14, 1.20, 4.56, 6.79, 0.01, 0.02]))
df = df.set_index(['A', 'B', 'year'])
I would like to forward-fill values, but only for the intervening years in each group (defined by the interaction of A and B). 我想提前填充值,但仅限于每组的间隔年份 (由A和B的相互作用定义)。 Here is the input:
这是输入:
value
A B year
foo white 1990 3.14
black 1992 1.20
bar white 1990 4.56
1992 6.79
black 1991 0.01
1992 0.02
And here is the desired output, with one additional row: 这是所需的输出,还有一行:
value
A B year
foo white 1990 3.14
black 1992 1.20
bar white 1990 4.56
1991 4.56 <-- new forward-filled value
1992 6.79
black 1991 0.01
1992 0.02
How can I accomplish this concisely and efficiently? 我怎样才能简洁有效地完成这项工作? I've tried using combinations of
groupby
and apply
, but I'm new to PANDAS and keep throwing Exceptions. 我尝试过使用
groupby
和apply
组合,但我是PANDAS的新手并继续抛出异常。
Here's an example of how I'm naively approaching the problem: 这是我如何天真地接近问题的一个例子:
def ffill_years(df):
df.reset_index(['A', 'B']) # drop all but 'year'
year_range = range(df['year'].min(), df['year'].max())
df.reindex(pd.Series(years)).fillna("ffill")
return df
df.groupby(level=['A', 'B']).apply(ffill_years)
Of course this doesn't work. 当然这不起作用。 Any and all tips appreciated!
任何和所有提示赞赏!
You were pretty close - a couple small changes: 你非常接近 - 一些小的变化:
reset_index
doesn't operate in place reset_index
无法正常运行 .index
.index
reindex
is also not in-place reindex
也不到位 method
method
See below: 见下文:
def ffill_years(df):
df = df.reset_index(['A','B']) # drop all but 'year'
year_range = range(df.index.min(), df.index.max() + 1)
df = df.reindex(pd.Series(year_range)).fillna(method='ffill')
return df
Results in 结果是
In [209]: df.groupby(level=['A','B']).apply(ffill_years)
Out[209]:
A B value
A B year
bar black 1991 bar black 0.01
1992 bar black 0.02
white 1990 bar white 4.56
1991 bar white 4.56
1992 bar white 6.79
foo black 1992 foo black 1.20
white 1990 foo white 3.14
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.