[英]Pandas reset inner level of MultiIndex
I have a DF in the following format:我有以下格式的 DF:
col1 col2
ID Date
1 1993-12-31 4 6
1994-12-31 8 5
1995-12-31 4 7
1996-12-31 3 3
2 2000-12-31 7 8
2001-12-31 5 9
2002-12-31 8 4
And I want to reset the 'Date' index giving the following:我想重置“日期”索引,给出以下内容:
col1 col2
ID Date
1 0 4 6
1 8 5
2 4 7
3 3 3
2 0 7 8
1 5 9
2 8 4
I thought simply df.reset_index(level='Date', inplace=True, drop=True)
would do it, but it does not.我以为只是df.reset_index(level='Date', inplace=True, drop=True)
就可以做到,但事实并非如此。
Using set_index
and cumcount
: 使用set_index
和cumcount
:
tmp = df.reset_index('Date', drop=True)
tmp.set_index(df.groupby(level=0).cumcount().rename('Date'), append=True)
col1 col2
ID Date
1 0 4 6
1 8 5
2 4 7
3 3 3
2 0 7 8
1 5 9
2 8 4
You can groupby ID
, then reset the index on each group using apply
: 您可以分组ID
,然后使用apply
重置每个组的索引:
new_df = (df.groupby(df.index.get_level_values('ID'))
.apply(lambda x: x.reset_index()).drop(['ID','Date'],1))
new_df.index = new_df.index.rename(['ID','Date'])
>>> new_df
col1 col2
ID Date
1 0 4 6
1 8 5
2 4 7
3 3 3
2 0 7 8
1 5 9
2 8 4
Using pd.MultiIndex.from_arrays
and groupby
+ cumcount
. 使用pd.MultiIndex.from_arrays
和groupby
+ cumcount
。
df.index = pd.MultiIndex.from_arrays(
[df.index.get_level_values(0), df.groupby(level=0).cumcount()],
names=['ID', 'Date'])
df
col1 col2
ID Date
1 0 4 6
1 8 5
2 4 7
3 3 3
2 0 7 8
1 5 9
2 8 4
This won't generalise to N levels, but there should be a df.index.set_levels
equivalent I'm forgetting... 这不会推广到N级,但是应该有一个df.index.set_levels
等价我忘了......
Not as cool as the old answer but I'd rather be accurate than cool. 不像旧答案那么酷,但我宁愿准确也不酷。
from collections import defaultdict
from itertools import count
d = defaultdict(count)
lbl = []
for a, *_ in df.index.values:
lbl.append(next(d[a]))
lvl = pd.RangeIndex(max(lbl) + 1)
df.set_index(df.index.set_labels(lbl, 1).set_levels(lvl, 1))
col1 col2
ID Date
1 0 4 6
1 8 5
2 4 7
3 3 3
2 0 7 8
1 5 9
2 8 4
I misread the question. 我误解了这个问题。 I didn't see that the new index needed to reset for every group. 我没有看到新索引需要为每个组重置。
Hopefully useful to someone. 希望对某人有用。
pandas.MultiIndex.set_levels
您可以使用pandas.MultiIndex.set_levels
n = 1
lvl = df.index.levels[n]
new_lvl = pd.RangeIndex(len(lvl))
new_idx = df.index.set_levels(new_lvl, n)
df.set_index(new_idx)
col1 col2
ID Date
1 0 4 6
1 8 5
2 4 7
3 3 3
2 4 7 8
5 5 9
6 8 4
Yay! 好极了! \\o/
df.set_index(df.index.set_levels(pd.RangeIndex(len(df.index.levels[1])), 1))
col1 col2
ID Date
1 0 4 6
1 8 5
2 4 7
3 3 3
2 4 7 8
5 5 9
6 8 4
df.index.set_levels(pd.RangeIndex(len(df.index.levels[1])), 1, inplace=True)
df
col1 col2
ID Date
1 0 4 6
1 8 5
2 4 7
3 3 3
2 4 7 8
5 5 9
6 8 4
Try this:尝试这个:
df.groupby(level=0).apply(lambda _group:_group.reset_index())
*** vrsions warning : ***版本警告:
the following behavior was tested on pandas version: "1.1.2"以下行为在 pandas 版本上进行了测试: “1.1.2”
according to Pandas - Release notes :根据Pandas - 发行说明:
-> it seem that from version 1.3.0 may be a fix that could effect this method, see Bug-Fix -> 似乎从版本1.3.0开始可能会影响此方法,请参阅Bug-Fix
Example:例子:
let's create MultiIndex df by concatenate dictionary with 2 df, such as the key of each level will be appended into the index level让我们通过将字典与 2 个 df 连接来创建 MultiIndex df,例如每个级别的键将附加到索引级别
import pandas as pd
import numpy as np
raw_df = pd.concat({'First':pd.DataFrame(np.random.rand(4,4),index=range(4)),
'Second':pd.DataFrame(np.random.rand(4,4),index=range(41,45))})
result:结果:
result_df = raw_df.groupby(level=0).apply(lambda _group:_group.reset_index(drop=True))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.