[英]Pandas : Create new column based on text value of another column
This might be very simple question, but here's my dataframe:这可能是一个非常简单的问题,但这是我的数据框:
id text position labels
0 39088 skin melanoma [58.0, 71.0] indication
1 39088 proteinase [137.0, 147.0] protein
2 39088 plasminogen activator [170.0, 191.0] protein
3 39088 NaN [nan, nan] NaN
4 39088 NaN [nan, nan] NaN
5 39088 proteinase substrates [36.0, 57.0] protein
6 39088 tumors [67.0, 73.0] indication
7 39088 NaN [nan, nan] NaN
8 39088 Melanoma [0.0, 8.0] indication
9 39088 EDTA [172.0, 176.0] protein
{'pmid': [39088,
39088,
39088,
39088,
39088,
39088,
39088,
39088,
39088,
39088],
'text': ['skin melanoma',
'proteinase',
'plasminogen activator',
nan,
nan,
'proteinase substrates',
'tumors',
nan,
'Melanoma',
'EDTA'],
'position': ['[58.0, 71.0]',
'[137.0, 147.0]',
'[170.0, 191.0]',
'[nan, nan]',
'[nan, nan]',
'[36.0, 57.0]',
'[67.0, 73.0]',
'[nan, nan]',
'[0.0, 8.0]',
'[172.0, 176.0]'],
'labels': ['indication',
'protein',
'protein',
nan,
nan,
'protein',
'indication',
nan,
'indication',
'protein']}
And here's the WANTED OUTPUT , where I want to create 2 new columns based on the values of the labels
column, and put as values the corresponding text
and position
depending on whether they are indication or protein, and for the rest NaN
.这是WANTED OUTPUT ,我想根据
labels
列的值创建 2 个新列,并将相应的text
和position
作为值,具体取决于它们是指示还是蛋白质,其余为NaN
。
id indication indication.position protein protein.position
0 39088 skin melanoma [58.0, 71.0] NaN [nan, nan]
1 39088 NaN [nan, nan] proteinase [137.0, 147.0]
2 39088 NaN [nan, nan] plasminogen activator [170.0, 191.0]
3 39088 NaN [nan, nan] NaN [nan, nan]
4 39088 NaN [nan, nan] NaN [nan, nan]
5 39088 NaN [nan, nan] proteinase substrates [36.0, 57.0]
6 39088 tumors [67.0, 73.0] NaN [nan, nan]
7 39088 NaN [nan, nan] NaN [nan, nan]
8 39088 Melanoma [0.0, 8.0] NaN [nan, nan]
9 39088 NaN [nan, nan] EDTA [172.0, 176.0]
What is the best way to do this?做这个的最好方式是什么? Can someone help?
有人可以帮忙吗?
You can use:您可以使用:
out = (df
.drop(columns=['position', 'result.value.labels'])
.join(
df.reset_index().dropna(subset=['result.value.labels'])
.pivot(index='index', columns='result.value.labels', values='position')
.reindex(df.index)
.fillna('[nan, nan]')
.add_suffix('.position')
)
)
output:输出:
pmid result.value.text indication.position protein.position
0 39088 skin melanoma [58.0, 71.0] [nan, nan]
1 39088 proteinase [nan, nan] [137.0, 147.0]
2 39088 plasminogen activator [nan, nan] [170.0, 191.0]
3 39088 NaN [nan, nan] [nan, nan]
4 39088 NaN [nan, nan] [nan, nan]
5 39088 proteinase substrates [nan, nan] [36.0, 57.0]
6 39088 tumors [67.0, 73.0] [nan, nan]
7 39088 NaN [nan, nan] [nan, nan]
8 39088 Melanoma [0.0, 8.0] [nan, nan]
9 39088 EDTA [nan, nan] [172.0, 176.0]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.