格式化数据框减少列和增加行

Question

I have a pandas data frame like this我有一个这样的 pandas 数据框

data = [['NGS_34',0.47,0.55,0.51,0.53,0.54,0.62], ['NGS_38',0.52,0.52,0.49,0.51,0.52,0.45]]
 
df = pd.DataFrame(data, columns = ['probes','01_PLAGL1', '02_PLAGL1','H19','H19', 'H19','GNAS_A/B'])
df = df.set_index('probes')
df

             01_PLAGL1  02_PLAGL1   H19   H19   H19  GNAS_A/B
probes                                                  
NGS_34       0.47       0.55  0.51  0.53  0.54      0.62
NGS_38       0.52       0.52  0.49  0.51  0.52      0.45

This is actually a minimal reproducible example.这实际上是一个最小的可重现示例。 The real data frame is formed by many paired columns like the example 01_PLAGL1 02_PLAGL1 , then 2 sets of three columns like the example H19 H19 H19 and 2 unique columns.真实的数据框由许多成对的列组成，例如示例01_PLAGL1 02_PLAGL1 ，然后是 2 组三列，例如示例H19 H19 H19和 2 个唯一列。 With this explanation and the columns of my real dataset below, I think you will understand the input data of my problem.通过这个解释和下面我的真实数据集的列，我想你会理解我的问题的输入数据。

data_no_control.columns.values

array(['PLAGL1', 'PLAGL1', 'GRB10', 'GRB10', 'MEST', 'MEST', 'H19', 'H19',
       'H19', 'KCNQ1OT1', 'KCNQ1OT1', 'MEG3', 'MEG3', 'MEG8', 'MEG8',
       'SNRPN', 'SNRPN', 'PEG3', 'PEG3', 'PEG3', 'NESP55', 'GNAS-AS1',
       'GNASXL', 'GNASXL', 'GNAS_A/B'], dtype=object)

The final output I would like to achieve should be like this最后我想实现的output应该是这样的

            01_PLAGL1     H19      GNAS A/B
probes                                                  
NGS_34       0.47         0.51      0.62
             0.55         0.53
                          0.54
(One empty row)
(Second empty row)
NGS_38       0.52         0.49      0.45
             0.52         0.51
                          0.52
(One empty row)
(Second empty row)
NGS_41 ...

I have tried this我试过这个

df = data_no_control.reset_index(level=0)


empty_rows = 5
df.index = range(0, empty_rows*len(df), empty_rows)
new_df = df.reindex(index=range(empty_rows*len(df)))

new_df = new_df.set_index('index')

new_df

index        01_PLAGL1  02_PLAGL1   H19   H19   H19  GNAS_A/B
                                                  
NGS_34       0.47       0.55  0.51  0.53  0.54      0.62
NaN          NaN         NaN  NaN    NaN. NaN       NaN
NaN          NaN         NaN  NaN    NaN. NaN       NaN
NaN          NaN         NaN  NaN    NaN. NaN       NaN
NaN          NaN         NaN  NaN    NaN. NaN       NaN
NGS_38       0.52       0.52  0.49  0.51  0.52      0.45
NaN          NaN         NaN  NaN    NaN. NaN       NaN
NaN          NaN         NaN  NaN    NaN. NaN       NaN
NaN          NaN         NaN  NaN    NaN. NaN       NaN
NaN          NaN         NaN  NaN    NaN. NaN       NaN

Answer 1

Use:利用：

data = [['NGS_34',0.47,0.55,0.51,0.53,0.54,0.62], ['NGS_38',0.52,0.52,0.49,0.51,0.52,0.45]]

df = pd.DataFrame(data, columns = ['probes','01_PLAGL1', '02_PLAGL1','H19','H19', 'H19','GNAS_A/B'])
df = df.set_index('probes')

#No of new rows
new = 2
#reove values before _ for pairs columns names
s = df.columns.str.split('_').str[-1].to_series()
#create Multiindex by counter
df.columns = [s, s.groupby(s).cumcount()]
#reshape
df = df.stack()
#create MultiIndex for add new rows and original order in columns names
mux = pd.MultiIndex.from_product([df.index.levels[0],
                                  np.arange(df.index.levels[1].max() + new + 1)])
df = df.reindex(index=mux, columns=s.unique())
print (df)
          PLAGL1   H19   A/B
probes                      
NGS_34 0    0.47  0.51  0.62
       1    0.55  0.53   NaN
       2     NaN  0.54   NaN
       3     NaN   NaN   NaN
       4     NaN   NaN   NaN
NGS_38 0    0.52  0.49  0.45
       1    0.52  0.51   NaN
       2     NaN  0.52   NaN
       3     NaN   NaN   NaN
       4     NaN   NaN   NaN

Last if need empty values instead misisng values and no counter level use:最后如果需要空值而不是 misisng 值并且没有计数器级别使用：

df = df.droplevel(1).fillna('')
df.index = df.index.where(~df.index.duplicated(), '')
print (df)
       PLAGL1   H19   A/B
probes                   
NGS_34   0.47  0.51  0.62
         0.55  0.53      
               0.54      
                         
                         
NGS_38   0.52  0.49  0.45
         0.52  0.51      
               0.52

EDIT: In real data are not duplicates, so ouput is different:编辑：在实际数据中不重复，所以输出不同：

d = {'PLAGL1': {'NGS_34': 0.55, 'NGS_38': 0.52}, 'GRB10': {'NGS_34': 0.48, 'NGS_38': 0.49}, 'MEST': {'NGS_34': 0.56, 'NGS_38': 0.5}, 'H19': {'NGS_34': 0.54, 'NGS_38': 0.52}, 'KCNQ1OT1': {'NGS_34': 0.41, 'NGS_38': 0.49}, 'MEG3': {'NGS_34': 0.5, 'NGS_38': 0.55}, 'MEG8': {'NGS_34': 0.46, 'NGS_38': 0.5}, 'SNRPN': {'NGS_34': 0.55, 'NGS_38': 0.46}, 'PEG3': {'NGS_34': 0.51, 'NGS_38': 0.51}, 'NESP55': {'NGS_34': 0.55, 'NGS_38': 0.53}, 'GNAS-AS1': {'NGS_34': 0.52, 'NGS_38': 0.48}, 'GNASXL': {'NGS_34': 0.49, 'NGS_38': 0.44}, 'GNAS A/B': {'NGS_34': 0.62, 'NGS_38': 0.45}}

df = pd.DataFrame(d)
print (df)

        PLAGL1  GRB10  MEST   H19  KCNQ1OT1  MEG3  MEG8  SNRPN  PEG3  NESP55  \
NGS_34    0.55   0.48  0.56  0.54      0.41  0.50  0.46   0.55  0.51    0.55   
NGS_38    0.52   0.49  0.50  0.52      0.49  0.55  0.50   0.46  0.51    0.53   

        GNAS-AS1  GNASXL  GNAS A/B  
NGS_34      0.52    0.49      0.62  
NGS_38      0.48    0.44      0.45  

#No of new rows
new = 2
#reove values before _ for pairs columns names
s = df.columns.str.split('_').str[-1].to_series()
#create Multiindex by counter
df.columns = [s, s.groupby(s).cumcount()]
#reshape
df = df.stack()
#create MultiIndex for add new rows and original order in columns names
mux = pd.MultiIndex.from_product([df.index.levels[0],
                                  np.arange(df.index.levels[1].max() + new + 1)])
df = df.reindex(index=mux, columns=s.unique())

print (df)
          PLAGL1  GRB10  MEST   H19  KCNQ1OT1  MEG3  MEG8  SNRPN  PEG3  \
NGS_34 0    0.55   0.48  0.56  0.54      0.41  0.50  0.46   0.55  0.51   
       1     NaN    NaN   NaN   NaN       NaN   NaN   NaN    NaN   NaN   
       2     NaN    NaN   NaN   NaN       NaN   NaN   NaN    NaN   NaN   
NGS_38 0    0.52   0.49  0.50  0.52      0.49  0.55  0.50   0.46  0.51   
       1     NaN    NaN   NaN   NaN       NaN   NaN   NaN    NaN   NaN   
       2     NaN    NaN   NaN   NaN       NaN   NaN   NaN    NaN   NaN   

          NESP55  GNAS-AS1  GNASXL  GNAS A/B  
NGS_34 0    0.55      0.52    0.49      0.62  
       1     NaN       NaN     NaN       NaN  
       2     NaN       NaN     NaN       NaN  
NGS_38 0    0.53      0.48    0.44      0.45  
       1     NaN       NaN     NaN       NaN  
       2     NaN       NaN     NaN       NaN

格式化数据框减少列和增加行

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-05 06:15:10

格式化数据框减少列和增加行

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-05 06:15:10

解决方案1
1 已采纳 2022-05-05 06:15:10