Python / pandas - 根据另一列中的单词在列中添加单词

Question

我正在使用带有pandas的xlsx文件，如果前一列在预定义的bodyparts列表中包含一个单词，我想在列中添加单词“bodypart”。

原始数据帧：

Sentence   Type
my hand    NaN
the fish   NaN

结果数据框：

Sentence   Type
my hand    bodypart
the fish   NaN

我没有尝试过任何工作。 我觉得我错过了一些非常明显的东西。 这是我最后一次（失败）的尝试：

import pandas as pd 
import numpy as np
bodyparts = ['lip ', 'lips ', 'foot ', 'feet ', 'heel ', 'heels ', 'hand ', 'hands ']

df = pd.read_excel(file)

for word in bodyparts :
    if word in df["Sentence"] : df["Type"] = df["Type"].replace(np.nan, "bodypart", regex = True)

我也试过这个，使用变体“NaN”和NaN作为str.replace的第一个参数：

if word in df['Sentence'] : df["Type"] = df["Type"].str.replace("", "bodypart")

任何帮助将不胜感激！

Answer 1

一个肮脏的解决方案将涉及检查两组的交集。

集合A是您的身体部位列表，集合B是句子中的单词集合

df['Sentence']\
     .apply(lambda x: 'bodypart' if set(x.split()) \
     .symmetric_difference(bodyparts) else None)

Answer 2

最简单的方法：

df.loc[df.Sentence.isin(bodyparts),'Type']='Bodypart'

在你必须丢弃bodyparts空间之前：

bodyparts = {'lip','lips','foot','feet','heel','heels','hand','hands'}

df.Sentence.isin(bodyparts)选择好行，并Type要设置的列。 .loc是允许修改的索引器。

Answer 3

您可以创建一个正则表达式来搜索单词边界，然后将其用作str.contains的参数，例如：

import pandas as pd 
import numpy as np
import re

bodyparts = ['lips?', 'foot', 'feet', 'heels?', 'hands?', 'legs?']
rx = re.compile('|'.join(r'\b{}\b'.format(el) for el in bodyparts))

df = pd.DataFrame({
    'Sentence': ['my hand', 'the fish', 'the rabbit leg', 'hand over', 'something', 'cabbage', 'slippage'],
    'Type': [np.nan] * 7
})

df.loc[df.Sentence.str.contains(rx), 'Type'] = 'bodypart'

给你：

         Sentence      Type
0         my hand  bodypart
1        the fish       NaN
2  the rabbit leg  bodypart
3       hand over  bodypart
4       something       NaN
5         cabbage       NaN
6        slippage       NaN

Python / pandas - 根据另一列中的单词在列中添加单词

问题描述

3 个解决方案

解决方案1
0 2017-03-10 16:31:14

解决方案2
0 2017-03-10 16:45:24

解决方案3
0 已采纳 2017-03-10 16:50:45

Python / pandas - 根据另一列中的单词在列中添加单词

问题描述

3 个解决方案

解决方案1 0 2017-03-10 16:31:14

解决方案2 0 2017-03-10 16:45:24

解决方案3 0 已采纳 2017-03-10 16:50:45

解决方案1
0 2017-03-10 16:31:14

解决方案2
0 2017-03-10 16:45:24

解决方案3
0 已采纳 2017-03-10 16:50:45