简体   繁体   English

pandas:如果列表中存在值,则将字符串添加到逗号分隔列中的某些值

[英]pandas: add a string to certain values in a comma separated column if values exist in a list

I have a pandas dataframe as follows,我有一个 pandas dataframe 如下,

import pandas as pd
import numpy as np

df = pd.DataFrame({'text':['this is the good student','she wears a beautiful green dress','he is from a friendly family of four','the house is empty','the number four five is new'],
               'labels':['O,O,O,ADJ,O','O,O,O,ADJ,ADJ,O','O,O,O,O,ADJ,O,O,NUM','O,O,O,O','O,O,NUM,NUM,O,O']})

I would like to add a 'B-' label to the ADJ or NUM is they are not repeated right after, and 'I-' if there is a repetition.我想在 ADJ 或 NUM 中添加一个“B-”label,如果它们之后没有重复,如果有重复,则添加“I-”。 so here is my desired output,所以这是我想要的 output,

output: output:

                                   text               labels
0              this is the good student          O,O,O,B-ADJ,O
1     she wears a beautiful green dress      O,O,O,B-ADJ,I-ADJ,O
2  he is from a friendly family of four  O,O,O,O,B-ADJ,O,O,B-NUM
3                    the house is empty              O,O,O,O
4           the number four five is new      O,O,B-NUM,I-NUM,O,O

so far I have created a list of unique values as such到目前为止,我已经创建了一个唯一值列表

unique_labels = (np.unique(sum(df["labels"].str.split(',').dropna().to_numpy(), []))).tolist()
unique_labels.remove('O') # no changes required for O label

and tried to first add the B label which I got an error(ValueError: Must have equal len keys and value when setting with an iterable),并尝试首先添加 B label,但出现错误(ValueError: Must have equal len keys and value when setting with an iterable),

for x in unique_labels:
    df.loc[df["labels"].str.contains(x), "labels"]= ['B-' + x for x in df["labels"]]

Try:尝试:

from itertools import groupby


def fn(x):
    out = []
    for k, g in groupby(map(str.strip, x.split(","))):
        if k == "O":
            out.extend(g)
        else:
            out.append(f"B-{next(g)}")
            out.extend([f"I-{val}" for val in g])
    return ",".join(out)


df["labels"] = df["labels"].apply(fn)
print(df)

Prints:印刷:

                                   text                   labels
0              this is the good student            O,O,O,B-ADJ,O
1     she wears a beautiful green dress      O,O,O,B-ADJ,I-ADJ,O
2  he is from a friendly family of four  O,O,O,O,B-ADJ,O,O,B-NUM
3                    the house is empty                  O,O,O,O
4           the number four five is new      O,O,B-NUM,I-NUM,O,O

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 包含对象列表的pandas列,根据键名拆分此列,并将值存储为逗号分隔的值 - pandas column containing list of objects, split this column based upon keynames and store values as comma separated values 将列值拆分为以逗号分隔的值列表 - Splitting a column value into list of values separated by comma Pandas 删除逗号分隔列值中的特定 int 值 - Pandas remove particular int values in comma separated column values pandas:根据列表和另一列条件替换逗号分隔列中的相应值 - pandas: replace corresponding values in a comma separated column based on a list and another column conditions groupby逗号分隔值在单个DataFrame列python / pandas中 - groupby comma-separated values in single DataFrame column python/pandas Pandas按组中所有值的总和与另一列以逗号分隔 - Pandas Group by sum of all the values of the group and another column as comma separated 如何在 pandas 的单个列中合并(逗号分隔的)行值? - How to combine (comma-separated) row values in a single column in pandas? 如何在新的列熊猫数据框中获取逗号分隔的值? - How to get comma separated values in new column pandas dataframe? 以逗号分隔值的大熊猫分隔列,但保持顺序 - Split column in pandas of comma separated values but maintining the order 如何用逗号在CSV中给逗号分隔的值添加一个新列? - How to give comma separated values a new column in csv with pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM