简体   繁体   English

通过两列的合并来重新索引熊猫数据框

[英]Re-index pandas dataframe by union of two columns

Probably a duplicate, but I'm not even sure what to search for. 可能是重复的,但我什至不知道要搜索什么。

If I have a pandas dataframe like so: 如果我有这样的熊猫数据框:

index RH  LH  Data1  Data2 . . . 
1     A1  A2  A      B
2     B1  NaN C      D
3     NaN C2  E      F

And I want to re-index as so: 我想这样重新索引:

index Data1  Data2
A1    A      B
A2    A      B
B1    C      D
C2    E      F

Is there a simple-ish way to do this? 有没有简单的方法可以做到这一点? Or should I just do a pair of for loops? 还是我应该做一对for循环?

You can use DataFrame.set_index with all columns without names defined in list and reshape by DataFrame.stack , then remove last level by DataFrame.reset_index with drop=True , convert all another levels to columns and create index by DataFrame.set_index : 您可以对所有未在列表中定义名称的列使用DataFrame.set_index ,并通过DataFrame.stack重塑DataFrame.stack ,然后使用drop=True通过DataFrame.reset_index删除最后一级,将所有其他级别转换为列并通过DataFrame.set_index创建索引:

cols = df.columns.difference(['RH','LH']).tolist()
df = (df.set_index(cols)
        .stack()
        .reset_index(len(cols), drop=True)
        .reset_index(name='idx')
        .set_index('idx'))
print (df)
    Data1 Data2
idx            
A1      A     B
A2      A     B
B1      C     D
C2      E     F

Or use DataFrame.melt with DataFrame.dropna , remove column variable and last create index by idx column: 或者将DataFrame.meltDataFrame.dropna DataFrame.melt使用,删除列variable并最后通过idx列创建索引:

df = (df.melt(cols, value_name='idx')
       .dropna(subset=['idx'])
       .drop('variable', axis=1)
       .set_index('idx'))
print (df)
    Data1 Data2
idx            
A1      A     B
B1      C     D
A2      A     B
C2      E     F

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM