正则表达式识别句子中的某些单词并且只识别前两个单词

Question

我有个问题。 我想使用正则表达式来识别文本中的某些文本模块。 例如， beach vibe some 。 问题是一些文本模块是三个字长（甚至更长）。 然而，大多数人只使用前两个，也许是第二个单词的缩写。

如果正则表达式只识别前两个单词，是否可以选择说它应该命中？ 并且它应该只查看第二个单词的前三个字母？

   customerId                          text          element  code
0           1    please use beach vibe some  beach vibe some     0
1           1     you should use beach vibe  beach vibe some     0
2           1           right use beach vib  beach vibe some     0
3           3              use floating pow   floating power     1
4           3  use floating stuff right now   floating stuff     2

import pandas as pd
import copy
import re
d = {
    "customerId": [1, 1, 1, 3, 3],
    "text": ["please use beach vibe some",
             "you should use beach vibe",
             "right use beach vib",
             'use floating pow',
             'use floating stuff right now'],
     "element": ['beach vibe some', 'beach vibe some', 'beach vibe some', 'floating power', 'floating stuff']
}
df = pd.DataFrame(data=d)
df['code'] = df['element'].astype('category').cat.codes
print(df)

def f(x):
    match = 999
    for element in df['element'].unique():
        check = bool(re.search(element, x['text'], re.IGNORECASE))
        if(check):
            #print(forwarder)
            match = df['code'].loc[df['element']== element].iloc[0]
            break
        elif(re.search(' '.join(element.split()[:2]), x['text'], re.IGNORECASE)):
            match = df['code'].loc[df['element']== element].iloc[0]
            break
        else:
          s = element.split()
          s[1] = s[1][:3]
          string = ' '.join(s[:2])
          if(bool(re.search(string, x['text'], re.IGNORECASE))):
            match = df['code'].loc[df['element']== element].iloc[0]
            break

    x['test'] = match
    return x
    #print(match)
df['test'] = None
df = df.apply(lambda x: f(x), axis = 1)
print(df)

   customerId                          text          element  code  test
0           1    please use beach vibe some  beach vibe some     0     0
1           1     you should use beach vibe  beach vibe some     0     0
2           1           right use beach vib  beach vibe some     0     0
3           3              use floating pow   floating power     1     1
4           3  use floating stuff right now   floating stuff     2     2

Answer 1

为什么要使用正则表达式？

element_parts = element.lower().split()
lookup_key = element_parts[0] + " " + element_parts[1][:3] 
if lookup_key in x["text"].lower():
    # here we go ...

正则表达式识别句子中的某些单词并且只识别前两个单词

问题描述

1 个解决方案

解决方案1
0 2022-07-08 07:22:12

正则表达式识别句子中的某些单词并且只识别前两个单词

问题描述

1 个解决方案

解决方案1 0 2022-07-08 07:22:12

解决方案1
0 2022-07-08 07:22:12