简体   繁体   English

Pandas:根据另一列的文本值创建新列

[英]Pandas : Create new column based on text value of another column

This might be very simple question, but here's my dataframe:这可能是一个非常简单的问题,但这是我的数据框:

    id      text                    position        labels
0   39088   skin melanoma           [58.0, 71.0]    indication
1   39088   proteinase              [137.0, 147.0]  protein
2   39088   plasminogen activator   [170.0, 191.0]  protein
3   39088   NaN                     [nan, nan]      NaN
4   39088   NaN                     [nan, nan]      NaN
5   39088   proteinase substrates   [36.0, 57.0]    protein
6   39088   tumors                  [67.0, 73.0]    indication
7   39088   NaN                     [nan, nan]      NaN
8   39088   Melanoma                [0.0, 8.0]      indication
9   39088   EDTA                    [172.0, 176.0]  protein
{'pmid': [39088,
  39088,
  39088,
  39088,
  39088,
  39088,
  39088,
  39088,
  39088,
  39088],
 'text': ['skin melanoma',
  'proteinase',
  'plasminogen activator',
  nan,
  nan,
  'proteinase substrates',
  'tumors',
  nan,
  'Melanoma',
  'EDTA'],
 'position': ['[58.0, 71.0]',
  '[137.0, 147.0]',
  '[170.0, 191.0]',
  '[nan, nan]',
  '[nan, nan]',
  '[36.0, 57.0]',
  '[67.0, 73.0]',
  '[nan, nan]',
  '[0.0, 8.0]',
  '[172.0, 176.0]'],
 'labels': ['indication',
  'protein',
  'protein',
  nan,
  nan,
  'protein',
  'indication',
  nan,
  'indication',
  'protein']}

And here's the WANTED OUTPUT , where I want to create 2 new columns based on the values of the labels column, and put as values the corresponding text and position depending on whether they are indication or protein, and for the rest NaN .这是WANTED OUTPUT ,我想根据labels列的值创建 2 个新列,并将相应的textposition作为值,具体取决于它们是指示还是蛋白质,其余为NaN

    id      indication     indication.position       protein                 protein.position 
0   39088   skin melanoma   [58.0, 71.0]             NaN                     [nan, nan]
1   39088   NaN             [nan, nan]               proteinase              [137.0, 147.0]
2   39088   NaN             [nan, nan]               plasminogen activator   [170.0, 191.0]  
3   39088   NaN             [nan, nan]               NaN                     [nan, nan]
4   39088   NaN             [nan, nan]               NaN                     [nan, nan]
5   39088   NaN             [nan, nan]               proteinase substrates   [36.0, 57.0] 
6   39088   tumors          [67.0, 73.0]             NaN                     [nan, nan]
7   39088   NaN             [nan, nan]               NaN                     [nan, nan]
8   39088   Melanoma                [0.0, 8.0]       NaN                     [nan, nan]
9   39088   NaN             [nan, nan]               EDTA                    [172.0, 176.0]     

What is the best way to do this?做这个的最好方式是什么? Can someone help?有人可以帮忙吗?

You can use:您可以使用:

out = (df
   .drop(columns=['position', 'result.value.labels'])
   .join(
 df.reset_index().dropna(subset=['result.value.labels'])
   .pivot(index='index', columns='result.value.labels', values='position')
   .reindex(df.index)
   .fillna('[nan, nan]')
   .add_suffix('.position')
   )
)

output:输出:

    pmid      result.value.text indication.position protein.position
0  39088          skin melanoma        [58.0, 71.0]       [nan, nan]
1  39088             proteinase          [nan, nan]   [137.0, 147.0]
2  39088  plasminogen activator          [nan, nan]   [170.0, 191.0]
3  39088                    NaN          [nan, nan]       [nan, nan]
4  39088                    NaN          [nan, nan]       [nan, nan]
5  39088  proteinase substrates          [nan, nan]     [36.0, 57.0]
6  39088                 tumors        [67.0, 73.0]       [nan, nan]
7  39088                    NaN          [nan, nan]       [nan, nan]
8  39088               Melanoma          [0.0, 8.0]       [nan, nan]
9  39088                   EDTA          [nan, nan]   [172.0, 176.0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据 pandas 中另一列的值创建一个新列 - Create a new column, based on the value of another column in pandas Pandas - 根据另一列的条件值创建新列 - Pandas - create new column based on conditional value of another column 根据另一列的值在熊猫中创建新列 - Create new column in pandas based on value of another column Python Pandas 根据另一个列值创建新列 - Python Pandas create new column based on another column value 熊猫根据另一列中的值创建新列,如果为False,则返回新列的先前值 - Pandas Create New Column Based on Value in Another Column, If False Return Previous Value of New Column 根据另一列中的值创建新列 - Create a new column based on value in another column 根据一列的条件和熊猫中另一列的值创建新列 - Create new column based on condition from one column and the value from another column in pandas 基于groupby一个列值和pandas中另一列的计数创建一个新列? - Create a new column based on groupby a column value and count of another column in pandas? 如何根据一组条件在 PANDAS 中创建一个新列,然后将新列设置为另一个字段的值 - How can I create a new column in PANDAS based on a set of conditions and then setting the new column to the value of another field 在 pandas 中的另一个值中创建一个新列 - Create a new column with value in another in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM