Python Pandas Groupby基于索引重置值

Question

So I have a dataframe that contains some wrong information that I want to fix: 所以我有一个数据框，其中包含我想要解决的一些错误信息：

import pandas as pd
tuples_index = [(1,1990), (2,1999), (2,2002), (3,1992), (3,1994), (3,1996)]
index = pd.MultiIndex.from_tuples(tuples_index, names=['id', 'FirstYear'])
df = pd.DataFrame([2007, 2006, 2006, 2000, 2000, 2000], index=index, columns=['LastYear'] )


df
Out[4]: 
              LastYear
id FirstYear          
1  1990           2007
2  1999           2006
   2002           2006
3  1992           2000
   1994           2000
   1996           2000

id refers to a business, and this DataFrame is a small example slice of a much larger one that shows how a business moves. id指的是一个业务，而这个DataFrame是一个较大的示例切片，它显示了业务的移动方式。 Each record is a unique location, and I want to capture the first and last year it was there. 每条记录都是一个独特的位置，我想捕捉它在那里的第一年和最后一年。 The current 'LastYear' is accurate for businesses with only one record, and accurate for the latest record of businesses for more than one record. 目前的“LastYear”对于只有一个记录的企业是准确的，并且对于多个记录的最新业务记录是准确的。 What the df should look like at the end is this: df最终应该是什么样的：

              LastYear
id FirstYear          
1  1990           2007
2  1999           2002
   2002           2006
3  1992           1994
   1994           1996
   1996           2000

And what I did to get it there was super clunky: 而我所做的就是超级笨重：

multirecord = df.groupby(level=0).filter(lambda x: len(x) > 1)
multirecord_grouped = multirecord.groupby(level=0)

ls = []
for _, group in multirecord_grouped:
    levels = group.index.get_level_values(level=1).tolist() + [group['LastYear'].iloc[-1]]
    ls += levels[1:]

multirecord['LastYear'] = pd.Series(ls, index=multirecord.index.copy())
final_joined = pd.concat([df.groupby(level=0).filter(lambda x: len(x) == 1),multirecord]).sort_index()

Is there a better way? 有没有更好的办法？

Answer 1

shift_year = lambda df: df.index.get_level_values('FirstYear').to_series().shift(-1)
df.groupby(level=0).apply(shift_year) \
    .combine_first(df.LastYear).astype(int) \
    .rename('LastYear').to_frame()

Python Pandas Groupby基于索引重置值

问题描述

1 个解决方案

解决方案1
6 已采纳 2016-08-23 23:34:27

Python Pandas Groupby基于索引重置值

问题描述

1 个解决方案

解决方案1 6 已采纳 2016-08-23 23:34:27

解决方案1
6 已采纳 2016-08-23 23:34:27