如何从 pandas dataframe 中的文本中匹配部分字符串

Question

我的数据框看起来像 -

id                               text
1         good,i am interested..please mail me.
2         call me...good to go with you
3         not interested...bye
4         i am not interested don't call me
5         price is too high so not interested
6         i have some requirement..please mail me

我希望数据框看起来像 -

id                               text                          is_relevant
1         good,i am interested..please mail me.                    yes
2         call me...good to go with you                            yes
3         not interested...bye                                      no
4         i am nt interested don't call me                          no
5         price is too high so not interested                       no
6         i have some requirement..please mail me                   yes

我已经完成了以下代码 -

d1 = {'no': ['Not interested','nt interested']}
d = {k: oldk for oldk, oldv in d1.items() for k in oldv}
df["is_relevant"] = df['new_text'].map(d).fillna('yes')

Answer 1

In [20]: df = pd.read_csv("a.csv")

In [21]: a
Out[21]: ['not interested', 'nt interested']

In [22]: df
Out[22]:
   id                                     text
0   1    good i am interested..please mail me.
1   2            call me...good to go with you
2   3                     not interested...bye
3   4        i am not interested don't call me
4   5      price is too high so not interested
5   6  i have some requirement..please mail me

In [23]: df["is_relevant"] = df["text"].apply(lambda x: "no" if (a[0] in x.lower() or a[1] in x.lower()) else "yes")

In [24]: df
Out[24]:
   id                                     text is_relevant
0   1    good i am interested..please mail me.         yes
1   2            call me...good to go with you         yes
2   3                     not interested...bye          no
3   4        i am not interested don't call me          no
4   5      price is too high so not interested          no
5   6  i have some requirement..please mail me         yes

Answer 2

你可以做：

d1 = {'no': ['not interested','nt interested']}

# create regex 
reg = '|'.join([f'\\b{x}\\b' for x in list(d1.values())[0]])

# apply function
df['is_relevant'] = df['text'].str.lower().str.contains(reg).map({True: 'no', False: 'yes'})

   id                                     text is_relevant
0   1    good,i am interested..please mail me.         yes
1   2            call me...good to go with you         yes
2   3                     not interested...bye          no
3   4        i am not interested don't call me          no
4   5      price is too high so not interested          no
5   6  i have some requirement..please mail me         yes
print(df)

Answer 3

这类似于上面 YOLO 的答案，但允许多个文本类。

df = pd.DataFrame(
    data = ["good,i am interested..please mail me.",
            "call me...good to go with you",
            "not interested...bye",
            "i am not interested don't call me",
            "price is too high so not interested",
            "i have some requirement..please mail me"],
    columns=['text'], index=[1,2,3,4,5,6])

d1 = {'no': ['Not interested','nt interested','not interested'],
      'maybe': ['requirement']}
df['is_relevant'] = 'yes'

for k in d1:
    match_inds = reduce(lambda x,y: x | y,
                        [df['text'].str.contains(pat) for pat in d1[k]])
    df.loc[match_inds, 'is_relevant'] = k

print(df)

Output

   text                                    is_relevant
1    good,i am interested..please mail me.         yes
2            call me...good to go with you         yes
3                     not interested...bye          no
4        i am not interested don't call me          no
5      price is too high so not interested          no
6  i have some requirement..please mail me       maybe

Answer 4

如果您想要的只是列表中的内容['not interested', 'nt interested'] 。

如果值在 ad dict 中，请将它们发送到如下列表lst=list(dict.values())并且仍然是np.where

然后只是np.where

lst=['not interested', 'nt interested']
df['is_relevant']=np.where(df.text.str.contains("|".join(lst)),'no','yes')

                                     text    is_relevant
1    good,i am interested..please mail me.         yes
2            call me...good to go with you         yes
3                     not interested...bye          no
4        i am not interested don't call me          no
5      price is too high so not interested          no
6  i have some requirement..please mail me         yes

如何从 pandas dataframe 中的文本中匹配部分字符串

问题描述

4 个解决方案

解决方案1
1 2020-07-14 05:58:47

解决方案2
1 2020-07-14 06:04:14

解决方案3
0 2020-07-14 06:18:15

解决方案4
0 2020-07-14 06:22:58

如何从 pandas dataframe 中的文本中匹配部分字符串

问题描述

4 个解决方案

解决方案1 1 2020-07-14 05:58:47

解决方案2 1 2020-07-14 06:04:14

解决方案3 0 2020-07-14 06:18:15

解决方案4 0 2020-07-14 06:22:58

解决方案1
1 2020-07-14 05:58:47

解决方案2
1 2020-07-14 06:04:14

解决方案3
0 2020-07-14 06:18:15

解决方案4
0 2020-07-14 06:22:58