如何從兩個DataFrame中訂購和保留公共索引

Question

我有兩個DataFrames ：

import pandas as pd
import io
from scipy import stats


ctrl=u"""probegenes,sample1,sample2,sample3
1415777_at Pnliprp1,20,0.00,11
1415884_at Cela3b,47,0.00,100
1415805_at Clps,17,0.00,55
1115805_at Ckkk,77,10.00,5.5
"""

df_ctrl = pd.read_csv(io.StringIO(ctrl),index_col='probegenes')

test=u"""probegenes,sample1,sample2,sample3
1415777_at Pnliprp1,20.1,10.00,22.3
1415805_at Clps,7,3.00,1.5
1415884_at Cela3b,47,2.01,30"""

df_test = pd.read_csv(io.StringIO(test),index_col='probegenes')

它們看起來像這樣：

In [35]: df_ctrl
Out[35]:
                     sample1  sample2  sample3
probegenes
1415777_at Pnliprp1       20        0     11.0
1415884_at Cela3b         47        0    100.0
1415805_at Clps           17        0     55.0
1115805_at Ckkk           77       10      5.5

In [36]: df_test
Out[36]:
                     sample1  sample2  sample3
probegenes
1415777_at Pnliprp1     20.1    10.00     22.3
1415805_at Clps          7.0     3.00      1.5
1415884_at Cela3b       47.0     2.01     30.0

我想：

獲取兩個DataFrame的公共index
同樣重新排序兩個DataFrame 。

因此，最后我得到兩個新的DataFrame ：

new_df_ctrl 

                     sample1  sample2  sample3
probegenes
1415884_at Cela3b         47        0    100.0
1415805_at Clps           17        0     55.0
1415777_at Pnliprp1       20        0     11.0


new_df_test

                     sample1  sample2  sample3
probegenes
1415884_at Cela3b       47.0     2.01     30.0
1415805_at Clps          7.0     3.00      1.5
1415777_at Pnliprp1     20.1    10.00     22.3

Answer 1

您可以使用join與參數how='inner' ，以獲得共同的指標。 然后使用此公共索引重新索引每個數據幀。

idx = df_ctrl.join(df_test, rsuffix='_', how='inner').index

>>> df_ctrl.reindex(idx)
                     sample1  sample2  sample3
probegenes                                    
1415777_at Pnliprp1       20        0       11
1415805_at Clps           17        0       55
1415884_at Cela3b         47        0      100

>>> df_test.reindex(idx)
                     sample1  sample2  sample3
probegenes                                    
1415777_at Pnliprp1     20.1    10.00     22.3
1415805_at Clps          7.0     3.00      1.5
1415884_at Cela3b       47.0     2.01     30.0

Answer 2

您可以使用pd.Index.intersection()並選擇使用.loc[]或.reindex() 。 在index上使用.sort_values()以獲得所需的順序：

idx = df_ctrl.index.intersection(df_test.index).sort_values(ascending=False)

df_ctrl.loc[idx]

                     sample1  sample2  sample3
probegenes                                    
1415884_at Cela3b         47      0.0    100.0
1415805_at Clps           17      0.0     55.0
1415777_at Pnliprp1       20      0.0     11.0

df_test.loc[idx]

                     sample1  sample2  sample3
probegenes                                    
1415884_at Cela3b       47.0     2.01     30.0
1415805_at Clps          7.0     3.00      1.5
1415777_at Pnliprp1     20.1    10.00     22.3

如何從兩個DataFrame中訂購和保留公共索引

問題描述

2 個解決方案

解決方案1
3 已采納 2016-05-18 02:56:01

解決方案2
1 2016-05-18 03:03:23

如何從兩個DataFrame中訂購和保留公共索引

問題描述

2 個解決方案

解決方案1 3 已采納 2016-05-18 02:56:01

解決方案2 1 2016-05-18 03:03:23

解決方案1
3 已采納 2016-05-18 02:56:01

解決方案2
1 2016-05-18 03:03:23