簡體   English   中英

使用子字符串信息重新格式化 Pandas DataFrame

[英]Reformat Pandas DataFrame Using Substring Information

我有一個如下所示的 Pandas DataFrame,其中包含一些我想從列標題和數據行中分離出來的信息:

    Type1_100_A Type2_200_B Seq
0   34.0        NaN         2_CTGT
1   4573.0      NaN         7_GATG
2   16.0        NaN         4_ACTT
3   NaN         17.0        5_GTCA
4   NaN         25.0        1_TCGA

代碼:

pd.DataFrame({
'Type1_3_A': {0: np.nan, 1: np.nan, 2: np.nan, 3: 17.0, 4: 25.0},
'Type2_3_B': {0: 34.0, 1: 4573.0, 2: 16.0, 3: np.nan, 4: np.nan},
'Seq': {
0: '2_CTGT',
1: '7_GATG',
2: '4_ACTT',
3: '5_GTCA',
4: '1_TCGA'
}
})

我想重新排列 DataFrame,使用數據和列標題的子字符串看起來像這樣:

    Type    Label   Replicate   Count   Seq     Number
0   Type1   100     A           2       CTGT    34.0
1   Type1   100     A           7       GATG    4573.0
2   Type1   100     A           4       ACTT    16.0
3   Type2   200     B           5       GTCA    17.0
4   Type2   200     B           1       TCGA    25.0

代碼:

pd.DataFrame({
    'Type': {0: 'Type1', 1: 'Type1', 2: 'Type1', 3: 'Type2', 4: 'Type2'},
    'Label': {0: 100, 1: 100, 2: 100, 3: 200, 4: 200},
    'Replicate': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B'},
    'Count': {0: 2, 1: 7, 2: 4, 3: 5, 4: 1},
    'Seq': {0: 'CTGT', 1: 'GATG', 2: 'ACTT', 3: 'GTCA', 4: 'TCGA'},
    'Number': {0: 34.0, 1: 4573.0, 2: 16.0, 3: 17.0, 4: 25.0},
    })

用:

#convert index to Seq
df1 = df.set_index('Seq')
#get sum if only one non NaN per rows
v = df1.sum(axis=1)

#repeat non NaNs by columns names
df1 = df1.notna().dot(df1.columns).reset_index(name='Type')
#split columns
df1[['Count','Seq']] = df1['Seq'].str.split('_', expand=True)
df1[['Type','Label','Replicate']] = df1['Type'].str.split('_', expand=True)
#set new columns
df1['Number'] = v.to_numpy()

#change order of columns
df1 = df1[['Type', 'Label', 'Replicate', 'Count', 'Seq', 'Number']]
print (df1) 
    Type Label Replicate Count   Seq  Number
0  Type2     3         B     2  CTGT    34.0
1  Type2     3         B     7  GATG  4573.0
2  Type2     3         B     4  ACTT    16.0
3  Type1     3         A     5  GTCA    17.0
4  Type1     3         A     1  TCGA    25.0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM