簡體   English   中英

根據多列的值將新列添加到數據框中

[英]Adding new columns to data frame based on the values of multiple columns

我有一個數據框,其標題如下:

df.head()
Out[660]:
Samples variable    value   Type
0   PE01I   267N12.3_Beta   0.066517    Beta
1   PE01R   R267N12.3_Beta  0.061617    Beta
2   PE02I   267N12.3_Beta   0.071013    Beta
3   PE02R   267N12.3_Beta   0.056623    Beta
4   PE03I   267N12.3_Beta   0.071633    Beta
5   PE01I   267N12.3_FPKM   0.000000    FPKM
6   PE01R   267N12.3_FPKM   0.003430    FPKM
7   PE02I   267N12.3_FPKM   0.272144    FPKM
8   PE02R   267N12.3_FPKM   0.005753    FPKM
9   PE03I   267N12.3_FPKM   0.078708    FPKM

我想通過基於“值”列中的對應值,使用“類型”列中的標題,添加標題名稱為Beta和FPKM的新列。 到目前為止,我是按照單線嘗試的,

df['Beta'] = df['Type'].map(lambda x: df.value if x == "Beta" else "FPKM")

它給sme以下輸出,

Samples variable    value   Type                      Beta
0   PE01I   267N12.3_Beta   0.066517    Beta        0 0.066517 1 0.061617 2 0.07...
1   PE01R   267N12.3_Beta   0.061617    Beta    0 0.066517 1 0.061617 2 0.07...
2   PE02I   267N12.3_Beta   0.071013    Beta    0 0.066517 1 0.061617 2 0.07...
3   PE02R   267N12.3_Beta   0.056623    Beta    0 0.066517 1 0.061617 2 0.07...
4   PE03I   267N12.3_Beta   0.071633    Beta    0 0.066517 1 0.061617 2 0.07...

Beta列具有三個值,並且所有列都在重復。 我的目標是要有一個看起來像的數據框,

Samples variable    Beta    FPKM
PE01I   267N12.3_Beta   0.066517    0
PE01R   267N12.3_Beta   0.061617    0.00343
PE02I   267N12.3_Beta   0.071013    0.272144
PE02R   267N12.3_Beta   0.056623    0.005753
PE03I   267N12.3_Beta   0.071633    0.078708

任何幫助都將非常棒..謝謝

我認為您需要unstack

df1 = df.set_index(['Samples','Type']).unstack()
print (df1)
               variable                    value          
Type               Beta           FPKM      Beta      FPKM
Samples                                                   
PE01I     267N12.3_Beta  267N12.3_FPKM  0.066517  0.000000
PE01R    R267N12.3_Beta  267N12.3_FPKM  0.061617  0.003430
PE02I     267N12.3_Beta  267N12.3_FPKM  0.071013  0.272144
PE02R     267N12.3_Beta  267N12.3_FPKM  0.056623  0.005753
PE03I     267N12.3_Beta  267N12.3_FPKM  0.071633  0.078708

#remove Multiindex in columns
df1.columns = ['_'.join(col) for col in df1.columns]
df1.reset_index(inplace=True)
print (df1)
  Samples   variable_Beta  variable_FPKM  value_Beta  value_FPKM
0   PE01I   267N12.3_Beta  267N12.3_FPKM    0.066517    0.000000
1   PE01R  R267N12.3_Beta  267N12.3_FPKM    0.061617    0.003430
2   PE02I   267N12.3_Beta  267N12.3_FPKM    0.071013    0.272144
3   PE02R   267N12.3_Beta  267N12.3_FPKM    0.056623    0.005753
4   PE03I   267N12.3_Beta  267N12.3_FPKM    0.071633    0.078708

#if need remove column
print (df1.drop('variable_FPKM', axis=1))
  Samples   variable_Beta  value_Beta  value_FPKM
0   PE01I   267N12.3_Beta    0.066517    0.000000
1   PE01R  R267N12.3_Beta    0.061617    0.003430
2   PE02I   267N12.3_Beta    0.071013    0.272144
3   PE02R   267N12.3_Beta    0.056623    0.005753
4   PE03I   267N12.3_Beta    0.071633    0.078708

通過評論編輯:

如果出現錯誤:

ValueError:索引包含重復的條目,無法重塑

這意味着您在index有重復的值,並且需要進行聚集。

您需要pivot_table並且如果aggfunc為np.sumnp.mean (使用數字), np.mean字符串列,而函數''.join僅適用於字符串值和數字。

使用不同的aggfunc調用函數兩次,然后使用concat

import pandas as pd

df = pd.DataFrame({'Type': {0: 'Beta', 1: 'Beta', 2: 'Beta', 3: 'Beta', 4: 'Beta', 5: 'FPKM', 6: 'FPKM', 7: 'FPKM', 8: 'FPKM', 9: 'FPKM'}, 'value': {0: 0.066516999999999993, 1: 0.061616999999999998, 2: 0.071012999999999993, 3: 0.056623, 4: 0.071633000000000002, 5: 0.0, 6: 0.0034299999999999999, 7: 0.272144, 8: 0.0057530000000000003, 9: 0.078708}, 'variable': {0: '267N12.3_Beta', 1: 'R267N12.3_Beta', 2: '267N12.3_Beta', 3: '267N12.3_Beta', 4: '267N12.3_Beta', 5: '267N12.3_FPKM', 6: '267N12.3_FPKM', 7: '267N12.3_FPKM', 8: '267N12.3_FPKM', 9: '267N12.3_FPKM'}, 'Samples': {0: 'PE01I', 1: 'PE01I', 2: 'PE02I', 3: 'PE02R', 4: 'PE03I', 5: 'PE01I', 6: 'PE01R', 7: 'PE02I', 8: 'PE02R', 9: 'PE03I'}})

#changed value in second row in column Samples
print (df)
  Samples  Type     value        variable
0   PE01I  Beta  0.066517   267N12.3_Beta
1   PE01I  Beta  0.061617  R267N12.3_Beta
2   PE02I  Beta  0.071013   267N12.3_Beta
3   PE02R  Beta  0.056623   267N12.3_Beta
4   PE03I  Beta  0.071633   267N12.3_Beta
5   PE01I  FPKM  0.000000   267N12.3_FPKM
6   PE01R  FPKM  0.003430   267N12.3_FPKM
7   PE02I  FPKM  0.272144   267N12.3_FPKM
8   PE02R  FPKM  0.005753   267N12.3_FPKM
9   PE03I  FPKM  0.078708   267N12.3_FPKM
df1 = df.pivot_table(index='Samples', columns=['Type'], aggfunc=','.join)
print (df1)
                             variable               
Type                             Beta           FPKM
Samples                                             
PE01I    267N12.3_Beta,R267N12.3_Beta  267N12.3_FPKM
PE01R                            None  267N12.3_FPKM
PE02I                   267N12.3_Beta  267N12.3_FPKM
PE02R                   267N12.3_Beta  267N12.3_FPKM
PE03I                   267N12.3_Beta  267N12.3_FPKM

df2 = df.pivot_table(index='Samples', columns=['Type'], aggfunc=np.mean)
print (df2)
            value          
Type         Beta      FPKM
Samples                    
PE01I    0.064067  0.000000
PE01R         NaN  0.003430
PE02I    0.071013  0.272144
PE02R    0.056623  0.005753
PE03I    0.071633  0.078708

df3 = pd.concat([df1, df2], axis=1)
df3.columns = ['_'.join(col) for col in df3.columns]
df3.reset_index(inplace=True)
print (df3)
  Samples                 variable_Beta  variable_FPKM  value_Beta  value_FPKM
0   PE01I  267N12.3_Beta,R267N12.3_Beta  267N12.3_FPKM    0.064067    0.000000
1   PE01R                          None  267N12.3_FPKM         NaN    0.003430
2   PE02I                 267N12.3_Beta  267N12.3_FPKM    0.071013    0.272144
3   PE02R                 267N12.3_Beta  267N12.3_FPKM    0.056623    0.005753
4   PE03I                 267N12.3_Beta  267N12.3_FPKM    0.071633    0.078708

您可以根據它們的“ Type列將它們分成2個數據幀,然后使用merge

In [14]: df_1 = df.loc[(df['Type'] == "Beta"), ['Samples', 'variable', 'value']]

In [15]: df_2 = df.loc[(df['Type'] == "FPKM"), ['Samples', 'value']]

In [16]: df_1['Beta'] = df_1['value']

In [17]: df_2['FPKM'] = df_2['value']

In [18]: df_1[['Samples', 'variable', 'Beta']].merge(df_2[['Samples', 'FPKM']], on="Samples")
Out[18]: 
  Samples        variable      Beta      FPKM
0   PE01I   267N12.3_Beta  0.066517  0.000000
1   PE01R  R267N12.3_Beta  0.061617  0.003430
2   PE02I   267N12.3_Beta  0.071013  0.272144
3   PE02R   267N12.3_Beta  0.056623  0.005753
4   PE03I   267N12.3_Beta  0.071633  0.078708

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM