如果它位于字符串的开头并以特定单词开头，Python会删除句子吗？

Question

我的字符串看起来像：

docs = ['Hi, my name is Eric. Are you blue?',
        "Hi, I'm ! What is your name?", 
        'This is a great idea. I would love to go.', 
        'Hello, I am Jane Brown. What is your name?', 
        "Hello, I am a doctor! Let's go to the mall.",
        'I am ready to go. Mom says hello.']

如果字符串的第一句以“Hi”或“Hello”开头，我想删除它。

期望的输出：

docs = ['Are you blue?',
        'What is your name?', 
        'This is a great idea. I would love to go.', 
        'What is your name?', 
        "Let's go to the mall."
        'I am ready to go. Mom says hello.']

我拥有的正则表达式是：

re.match('.*?[a-z0-9][.?!](?= )', x))

但这只会以奇怪的格式给出第一句话，例如：

<re.Match object; span=(0, 41), match='Hi, my name is Eric.'>

我该怎么做才能得到我想要的输出？

Answer 1

您可以使用

docs = [re.sub(r'^H(?:ello|i)\b.*?[.?!]\s+', '', doc) for doc in docs]

请参阅正则表达式演示。 详情：

^ - 字符串的开头
H(?:ello|i)\b - Hello或Hi词（ \b是词边界）
.*? - 尽可能少的除换行符以外的任何零个或多个字符
[.?!] - 一个. , ? 或!
\s+ - 一个或多个空格。

请参阅Python 演示：

import re
docs = ['Hi, my name is Eric. Are you blue?',
        "Hi, I'm ! What is your name?", 
        'This is a great idea. I would love to go.', 
        'Hello, I am Jane Brown. What is your name?', 
        "Hello, I am a doctor! Let's go to the mall.",
        'I am ready to go. Mom says hello.']
docs = [re.sub(r'^H(?:ello|i)\b.*?[.?!]\s+', '', doc) for doc in docs]
print(docs)

输出：

[
    'Are you blue?',
    'What is your name?',
    'This is a great idea. I would love to go.',
    'What is your name?',
    "Let's go to the mall.",
    'I am ready to go. Mom says hello.'
]

Answer 2

您必须首先将字符串拆分为句子

splitted_docs = []
for str in docs:
    splitted_docs.append(str.split('.'))

然后，您想使用正则表达式检查每个句子的 Hi 或 Hello 并将其添加到最终数组

final_docs = []
for str in splitted_docs:
    final_sentence = []
    for sentence in str:
        if not re.match('.*?[a-z0-9][.?!](?= )', sentence):
            final_sentence.append(sentence)
    final_docs.append(final_sentence.join('.'))

实际上，您的正则表达式不起作用，只是更改了代码以使其起作用，我如下所示：

for str in splitted_docs:

    final_sentence = []
    for sentence in str:
        if not 'Hello' in sentence and not 'Hi' in sentence:
            final_sentence.append(sentence)
    final_docs.append('.'.join(final_sentence))

最后，过滤您的数组以删除可能在加入过程中创建的所有空字符串：

final_docs = list(filter(lambda x: x != '', final_docs))
print(final_docs)

输出：

[' Are you blue?', 'This is a great idea. I would love to go.', ' What is your name?', 'I am ready to go. Mom says hello.']

我将在这里留下完整的代码，欢迎提出任何建议，我相信这可以通过一种更容易理解的更实用的方法来解决，但我对它的熟悉程度并不高。

import re
docs = ['Hi, my name is Eric. Are you blue?',
        "Hi, I'm ! What is your name?", 
        'This is a great idea. I would love to go.', 
        'Hello, I am Jane Brown. What is your name?', 
        "Hello, I am a doctor! Let's go to the mall.",
        'I am ready to go. Mom says hello.']

    
splitted_docs = []
for str in docs:
    splitted_docs.append(str.split('.'))


final_docs = []
for str in splitted_docs:

    final_sentence = []
    for sentence in str:
        if not 'Hello' in sentence and not 'Hi' in sentence:
            final_sentence.append(sentence)
    final_docs.append('.'.join(final_sentence))


final_docs = list(filter(lambda x: x != '', final_docs))
print(final_docs)

如果它位于字符串的开头并以特定单词开头，Python会删除句子吗？

问题描述

2 个解决方案

解决方案1
3 已采纳 2022-06-09 15:07:26

解决方案2
1 2022-06-09 15:21:06

如果它位于字符串的开头并以特定单词开头，Python会删除句子吗？

问题描述

2 个解决方案

解决方案1 3 已采纳 2022-06-09 15:07:26

解决方案2 1 2022-06-09 15:21:06

解决方案1
3 已采纳 2022-06-09 15:07:26

解决方案2
1 2022-06-09 15:21:06