如何删除 Pandas 中另一列 B 中存在的 A 列中的常见元素？

Question

How to delete common things (str, int, float) in one column that I also find in another column?如何删除一列中我也在另一列中找到的常见内容（str、int、float）？

Suppose I have in a dataframe :假设我有一个数据帧：

colA                              colBB            
eat a nice icecream               icecream            
I love to walk a lot              walk , to          
the city Paris is super           Paris, super  
        .
        .
        .

I would like to have this result :我想要这个结果：

colA                    colBB          
eat a nice              icecream          
I love a lot            walk , to           
the city is             Paris, super 
        .
        .
        .

And this applied to every row in a big pandas Df.这适用于大熊猫 Df 中的每一行。

I did lower the text and tokenized the sentences already but after that I am blocked for the application...我确实降低了文本并已经对句子进行了标记化，但在那之后我被应用程序阻止了......

Thank you谢谢

Answer 1

Try this尝试这个

code to make a df:制作df的代码：

df = pd.DataFrame({
    'colA': ['eat a nice icecream', 'I love to walk a lot','the city Paris is super'], 
    'colB': ['icecream', 'walk , to', 'Paris, super']})

    colA                      colB
0   eat a nice icecream       icecream
1   I love to walk a lot      walk , to
2   the city Paris is super   Paris, super

code to get expected output:获得预期输出的代码：

df.apply(lambda x: ' '.join([y.strip() for y in x[0].split(' ') if y.strip() not in x[1].split(' ')]), axis=1)

如何删除 Pandas 中另一列 B 中存在的 A 列中的常见元素？

问题描述

1 个解决方案

解决方案1
1 2020-03-05 10:20:36

如何删除 Pandas 中另一列 B 中存在的 A 列中的常见元素？

问题描述

1 个解决方案

解决方案1 1 2020-03-05 10:20:36

解决方案1
1 2020-03-05 10:20:36