[英]Pandas : Create new column based on text value of another column
这可能是一个非常简单的问题,但这是我的数据框:
id text position labels
0 39088 skin melanoma [58.0, 71.0] indication
1 39088 proteinase [137.0, 147.0] protein
2 39088 plasminogen activator [170.0, 191.0] protein
3 39088 NaN [nan, nan] NaN
4 39088 NaN [nan, nan] NaN
5 39088 proteinase substrates [36.0, 57.0] protein
6 39088 tumors [67.0, 73.0] indication
7 39088 NaN [nan, nan] NaN
8 39088 Melanoma [0.0, 8.0] indication
9 39088 EDTA [172.0, 176.0] protein
{'pmid': [39088,
39088,
39088,
39088,
39088,
39088,
39088,
39088,
39088,
39088],
'text': ['skin melanoma',
'proteinase',
'plasminogen activator',
nan,
nan,
'proteinase substrates',
'tumors',
nan,
'Melanoma',
'EDTA'],
'position': ['[58.0, 71.0]',
'[137.0, 147.0]',
'[170.0, 191.0]',
'[nan, nan]',
'[nan, nan]',
'[36.0, 57.0]',
'[67.0, 73.0]',
'[nan, nan]',
'[0.0, 8.0]',
'[172.0, 176.0]'],
'labels': ['indication',
'protein',
'protein',
nan,
nan,
'protein',
'indication',
nan,
'indication',
'protein']}
这是WANTED OUTPUT ,我想根据labels
列的值创建 2 个新列,并将相应的text
和position
作为值,具体取决于它们是指示还是蛋白质,其余为NaN
。
id indication indication.position protein protein.position
0 39088 skin melanoma [58.0, 71.0] NaN [nan, nan]
1 39088 NaN [nan, nan] proteinase [137.0, 147.0]
2 39088 NaN [nan, nan] plasminogen activator [170.0, 191.0]
3 39088 NaN [nan, nan] NaN [nan, nan]
4 39088 NaN [nan, nan] NaN [nan, nan]
5 39088 NaN [nan, nan] proteinase substrates [36.0, 57.0]
6 39088 tumors [67.0, 73.0] NaN [nan, nan]
7 39088 NaN [nan, nan] NaN [nan, nan]
8 39088 Melanoma [0.0, 8.0] NaN [nan, nan]
9 39088 NaN [nan, nan] EDTA [172.0, 176.0]
做这个的最好方式是什么? 有人可以帮忙吗?
您可以使用:
out = (df
.drop(columns=['position', 'result.value.labels'])
.join(
df.reset_index().dropna(subset=['result.value.labels'])
.pivot(index='index', columns='result.value.labels', values='position')
.reindex(df.index)
.fillna('[nan, nan]')
.add_suffix('.position')
)
)
输出:
pmid result.value.text indication.position protein.position
0 39088 skin melanoma [58.0, 71.0] [nan, nan]
1 39088 proteinase [nan, nan] [137.0, 147.0]
2 39088 plasminogen activator [nan, nan] [170.0, 191.0]
3 39088 NaN [nan, nan] [nan, nan]
4 39088 NaN [nan, nan] [nan, nan]
5 39088 proteinase substrates [nan, nan] [36.0, 57.0]
6 39088 tumors [67.0, 73.0] [nan, nan]
7 39088 NaN [nan, nan] [nan, nan]
8 39088 Melanoma [0.0, 8.0] [nan, nan]
9 39088 EDTA [nan, nan] [172.0, 176.0]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.