[英]Re-index pandas dataframe by union of two columns
Probably a duplicate, but I'm not even sure what to search for. 可能是重复的,但我什至不知道要搜索什么。
If I have a pandas dataframe like so: 如果我有这样的熊猫数据框:
index RH LH Data1 Data2 . . .
1 A1 A2 A B
2 B1 NaN C D
3 NaN C2 E F
And I want to re-index as so: 我想这样重新索引:
index Data1 Data2
A1 A B
A2 A B
B1 C D
C2 E F
Is there a simple-ish way to do this? 有没有简单的方法可以做到这一点? Or should I just do a pair of
for
loops? 还是我应该做一对
for
循环?
You can use DataFrame.set_index
with all columns without names defined in list and reshape by DataFrame.stack
, then remove last level by DataFrame.reset_index
with drop=True
, convert all another levels to columns and create index by DataFrame.set_index
: 您可以对所有未在列表中定义名称的列使用
DataFrame.set_index
,并通过DataFrame.stack
重塑DataFrame.stack
,然后使用drop=True
通过DataFrame.reset_index
删除最后一级,将所有其他级别转换为列并通过DataFrame.set_index
创建索引:
cols = df.columns.difference(['RH','LH']).tolist()
df = (df.set_index(cols)
.stack()
.reset_index(len(cols), drop=True)
.reset_index(name='idx')
.set_index('idx'))
print (df)
Data1 Data2
idx
A1 A B
A2 A B
B1 C D
C2 E F
Or use DataFrame.melt
with DataFrame.dropna
, remove column variable
and last create index by idx
column: 或者将
DataFrame.melt
与DataFrame.dropna
DataFrame.melt
使用,删除列variable
并最后通过idx
列创建索引:
df = (df.melt(cols, value_name='idx')
.dropna(subset=['idx'])
.drop('variable', axis=1)
.set_index('idx'))
print (df)
Data1 Data2
idx
A1 A B
B1 C D
A2 A B
C2 E F
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.