简体   繁体   English

我想比较熊猫数据框中的词对

[英]I want to compare word pair in panda data frame

Names
['abc aa','bdc sc','abc aa','bdc sp','bdc sc','pp sc','bdc sc',]
['lp aa','bd sc','bdc sc','bd sc','lp aa','bd sc']

['nn aa','bb sc','bb sc','nn aa','bd sc']

I tried as我试过

def drop_dupli(text):
    #seen = set()
    result = []
    
    for item in text.split(): 
        if item not in seen:
            seen.add(item)
            result. Append(item)
    return " ".join(result)
df['newame'] = df['Names'].apply(lambda x: drop_dupli(x))

The result came as follows:结果如下:

Names
['abc aa','bdc sc','abc ','bdc sp','bdc ','pp sc','bdc ',]
['lp aa','bd sc','bdc sc','bd ','lp ','bd ']

['nn aa','bb sc','bb ','nn ','bd ']

But, I want to get the result should come as follows:但是,我想得到的结果应该如下:

Names
['abc aa','bdc sc','bdc sp','pp sc']
['lp aa','bd sc','bdc sc']

['nn aa','bb sc','bd sc']

Use dict.fromkeys trick for remove duplicates in original order:使用dict.fromkeys技巧按原始顺序删除重复项:

df['newame'] = df['Names'].apply(lambda x: list(dict.fromkeys(x)))
print (df)
                                               Names  \
0  [abc aa, bdc sc, abc aa, bdc sp, bdc sc, pp sc...   
1        [lp aa, bd sc, bdc sc, bd sc, lp aa, bd sc]   
2                [nn aa, bb sc, bb sc, nn aa, bd sc]   

                            newame  
0  [abc aa, bdc sc, bdc sp, pp sc]  
1           [lp aa, bd sc, bdc sc]  
2            [nn aa, bb sc, bd sc]  

because if use set s order is changed:因为如果使用set的顺序改变了:

df['newame'] = df['Names'].apply(lambda x: list(set(x)))
print (df)
                                               Names  \
0  [abc aa, bdc sc, abc aa, bdc sp, bdc sc, pp sc...   
1        [lp aa, bd sc, bdc sc, bd sc, lp aa, bd sc]   
2                [nn aa, bb sc, bb sc, nn aa, bd sc]   

                            newame  
0  [pp sc, bdc sp, bdc sc, abc aa]  
1           [lp aa, bd sc, bdc sc]  
2            [bb sc, nn aa, bd sc]  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较熊猫数据框索引并更新行 - Compare panda data frame indices and update the rows 如何遍历熊猫数据框中的数据? - How do I iterate through data in a panda data frame? 比较两个熊猫数据框并根据条件更新一个数据框的最有效方法 - Most efficient way to compare two panda data frame and update one dataframe based on condition 如何计算熊猫数据框中的行平均值? - how to calculate row average in panda data frame? 比较数据框中字典的两列,并添加列中不存在的键值对 - Compare two columns of dictionary in a data frame and add the key-value pair which does not exist in column 我如何遍历嵌套字典列表并创建多索引熊猫数据框? - how can I to loop through a list of nested dictionaries and create a multi-indexed panda data frame? 将来自该对象的股票报价数据存储到python熊猫数据框中 - Storing stock quotes data from this object into python panda data frame 使用 for 循环、if 循环和 zip 迭代熊猫数据框和列表 - Iterate panda data frame and a list using for loop,if loop, and zip 如何在我们的熊猫数据框中获得相等的列? - how to get ride of equal column in our panda data frame? 使用 tkinter 更改按钮的文本,并将按钮存储在熊猫数据框中 - Change text of button with tkinter with button stored in panda data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM