简体   繁体   English

MultiIndexed DataFrame中的前向填充日期

[英]Forward-fill dates within MultiIndexed DataFrame

I have a PANDAs DataFrame with a MultiIndex, where one of the levels represents a year: 我有一个带有MultiIndex的PANDAs DataFrame,其中一个级别代表一年:

import pandas as pd
df = pd.DataFrame(dict(A = ['foo', 'foo', 'bar', 'bar', 'bar', 'bar'],
                       B = ['white', 'black', 'white', 'white', 'black', 'black'],
                       year = [1990, 1992, 1990, 1992, 1991, 1992],
                       value = [3.14, 1.20, 4.56, 6.79, 0.01, 0.02]))

df = df.set_index(['A', 'B', 'year'])

I would like to forward-fill values, but only for the intervening years in each group (defined by the interaction of A and B). 我想提前填充值,但仅限于每组的间隔年份 (由A和B的相互作用定义)。 Here is the input: 这是输入:

                value
A   B     year       
foo white 1990   3.14
    black 1992   1.20
bar white 1990   4.56
          1992   6.79
    black 1991   0.01
          1992   0.02

And here is the desired output, with one additional row: 这是所需的输出,还有一行:

                value
A   B     year       
foo white 1990   3.14
    black 1992   1.20
bar white 1990   4.56
          1991   4.56  <-- new forward-filled value
          1992   6.79
    black 1991   0.01
          1992   0.02

How can I accomplish this concisely and efficiently? 我怎样才能简洁有效地完成这项工作? I've tried using combinations of groupby and apply , but I'm new to PANDAS and keep throwing Exceptions. 我尝试过使用groupbyapply组合,但我是PANDAS的新手并继续抛出异常。

Here's an example of how I'm naively approaching the problem: 这是我如何天真地接近问题的一个例子:

def ffill_years(df):
    df.reset_index(['A', 'B'])  # drop all but 'year'
    year_range = range(df['year'].min(), df['year'].max())
    df.reindex(pd.Series(years)).fillna("ffill")
    return df

df.groupby(level=['A', 'B']).apply(ffill_years)

Of course this doesn't work. 当然这不起作用。 Any and all tips appreciated! 任何和所有提示赞赏!

You were pretty close - a couple small changes: 你非常接近 - 一些小的变化:

  1. reset_index doesn't operate in place reset_index无法正常运行
  2. Can't reference index by name, need to use .index 无法按名称引用索引,需要使用.index
  3. Need a +1 on your range to include the endpoint 您的范围需要+1以包含端点
  4. reindex is also not in-place reindex也不到位
  5. First parameter to fillna is a fill value, use keyword method fillna的第一个参数是填充值,使用关键字method

See below: 见下文:

def ffill_years(df):
    df = df.reset_index(['A','B'])  # drop all but 'year'
    year_range = range(df.index.min(), df.index.max() + 1)

    df = df.reindex(pd.Series(year_range)).fillna(method='ffill')
    return df

Results in 结果是

In [209]: df.groupby(level=['A','B']).apply(ffill_years)
Out[209]: 
                  A      B  value
A   B     year                   
bar black 1991  bar  black   0.01
          1992  bar  black   0.02
    white 1990  bar  white   4.56
          1991  bar  white   4.56
          1992  bar  white   6.79
foo black 1992  foo  black   1.20
    white 1990  foo  white   3.14

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM