繁体   English   中英

Pandas:根据另一列的文本值创建新列

[英]Pandas : Create new column based on text value of another column

这可能是一个非常简单的问题,但这是我的数据框:

    id      text                    position        labels
0   39088   skin melanoma           [58.0, 71.0]    indication
1   39088   proteinase              [137.0, 147.0]  protein
2   39088   plasminogen activator   [170.0, 191.0]  protein
3   39088   NaN                     [nan, nan]      NaN
4   39088   NaN                     [nan, nan]      NaN
5   39088   proteinase substrates   [36.0, 57.0]    protein
6   39088   tumors                  [67.0, 73.0]    indication
7   39088   NaN                     [nan, nan]      NaN
8   39088   Melanoma                [0.0, 8.0]      indication
9   39088   EDTA                    [172.0, 176.0]  protein
{'pmid': [39088,
  39088,
  39088,
  39088,
  39088,
  39088,
  39088,
  39088,
  39088,
  39088],
 'text': ['skin melanoma',
  'proteinase',
  'plasminogen activator',
  nan,
  nan,
  'proteinase substrates',
  'tumors',
  nan,
  'Melanoma',
  'EDTA'],
 'position': ['[58.0, 71.0]',
  '[137.0, 147.0]',
  '[170.0, 191.0]',
  '[nan, nan]',
  '[nan, nan]',
  '[36.0, 57.0]',
  '[67.0, 73.0]',
  '[nan, nan]',
  '[0.0, 8.0]',
  '[172.0, 176.0]'],
 'labels': ['indication',
  'protein',
  'protein',
  nan,
  nan,
  'protein',
  'indication',
  nan,
  'indication',
  'protein']}

这是WANTED OUTPUT ,我想根据labels列的值创建 2 个新列,并将相应的textposition作为值,具体取决于它们是指示还是蛋白质,其余为NaN

    id      indication     indication.position       protein                 protein.position 
0   39088   skin melanoma   [58.0, 71.0]             NaN                     [nan, nan]
1   39088   NaN             [nan, nan]               proteinase              [137.0, 147.0]
2   39088   NaN             [nan, nan]               plasminogen activator   [170.0, 191.0]  
3   39088   NaN             [nan, nan]               NaN                     [nan, nan]
4   39088   NaN             [nan, nan]               NaN                     [nan, nan]
5   39088   NaN             [nan, nan]               proteinase substrates   [36.0, 57.0] 
6   39088   tumors          [67.0, 73.0]             NaN                     [nan, nan]
7   39088   NaN             [nan, nan]               NaN                     [nan, nan]
8   39088   Melanoma                [0.0, 8.0]       NaN                     [nan, nan]
9   39088   NaN             [nan, nan]               EDTA                    [172.0, 176.0]     

做这个的最好方式是什么? 有人可以帮忙吗?

您可以使用:

out = (df
   .drop(columns=['position', 'result.value.labels'])
   .join(
 df.reset_index().dropna(subset=['result.value.labels'])
   .pivot(index='index', columns='result.value.labels', values='position')
   .reindex(df.index)
   .fillna('[nan, nan]')
   .add_suffix('.position')
   )
)

输出:

    pmid      result.value.text indication.position protein.position
0  39088          skin melanoma        [58.0, 71.0]       [nan, nan]
1  39088             proteinase          [nan, nan]   [137.0, 147.0]
2  39088  plasminogen activator          [nan, nan]   [170.0, 191.0]
3  39088                    NaN          [nan, nan]       [nan, nan]
4  39088                    NaN          [nan, nan]       [nan, nan]
5  39088  proteinase substrates          [nan, nan]     [36.0, 57.0]
6  39088                 tumors        [67.0, 73.0]       [nan, nan]
7  39088                    NaN          [nan, nan]       [nan, nan]
8  39088               Melanoma          [0.0, 8.0]       [nan, nan]
9  39088                   EDTA          [nan, nan]   [172.0, 176.0]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM