简体   繁体   English

如何构建这个正则表达式,以便它提取一个以大写字母开头的单词,前提是它出现在前一个模式之后?

[英]How to build this regex so that it extracts a word that starts with a capital letter if only if it appears after a previous pattern?

I need a regex that extracts all the names (we will consider that they are all the words that start with a capital letter and respect having certain conditions prior to their appearance within the sentence) that are in a sentence.我需要一个正则表达式来提取句子中的所有名称(我们将认为它们都是以大写字母开头的单词,并且在出现在句子中之前具有某些条件)。 This must be done respecting the pattern that I clarify below, also extracting the content before and after this name, so that it can be printed next to the name that was extracted within that sequence or pattern.这必须根据我在下面阐明的模式来完成,同时提取该名称之前和之后的内容,以便可以将其打印在该序列或模式中提取的名称旁边。


This is the pseudo-regex pattern that I need:这是我需要的伪正则表达式模式:

the beginning of the input sentence or (,|;|.|y)

associated_sense_1: "some character string (alphanumeric)" or "nothing"

(con |juntos a |junto a |en compania de )

identified_person: "some word that starts with a capital letter (the name that I must extract)" and it ends when the regex find one or more space

associated_sense_2: "some character string (alphanumeric)" or "nothing"

the end o the input sentence or (,|;|.|y |con |juntos a |junto a |en compania de )

the (,|;|.|y) are just person connectors that are used to build a regex pattern, but they do not provide information beyond indicating the sequence of belonging, then they can be eliminated with a .replace(, "") (,|;|.|y)只是用于构建正则表达式模式的人连接器,但它们不提供除了指示归属顺序之外的信息,然后可以使用.replace(, "")消除它们

And with this regex I need extract this 3 string groups使用这个正则表达式,我需要提取这3 个字符串组

associated_sense_1

identified_person

associated_sense_2


associated_sense = associated_sense_1 + " " + associated_sense_2

This is the proto-code:这是原型代码:

import re

#Example 1
sense = "puede ser peligroso ir solas, quizas sea mejor ir con Adrian y seguro que luego podemos esperar por Melisa, Marcos y Lucy en la parada"
#Example 2
#sense = "Adrian ya esta en la parada; y alli probablemente esten Lucy y May en la parada esperandonos"

person_identify_pattern = r"\s*(con |por |, y |, |,y |y )\s*[A-Z][^A-Z]*"
#person_identify_pattern = r"\s*(con |por |, y |, |,y |y )\s*[^A-Z]*"


for identified_person in re.split(person_identify_pattern, sense):
    identified_person = identified_person.strip()
    if identified_person:
        try:
            print(f"Write '{associated_sense}' to {identified_person}.txt")
        except:
            associated_sense = identified_person

The wrong output I get...我得到错误的 output...

Write 'puede ser peligroso ir solas, quizas sea mejor ir' to con.txt
Write 'puede ser peligroso ir solas, quizas sea mejor ir' to Melisa.txt
Write 'puede ser peligroso ir solas, quizas sea mejor ir' to ,.txt
Write 'puede ser peligroso ir solas, quizas sea mejor ir' to Lucy en la parada.txt

Correct output for example 1 :正确的 output例如 1

Write 'quizas sea mejor ir con' to Adrian.txt
Write 'y seguro que luego podemos esperar por en la parada' to Melisa.txt
Write 'y seguro que luego podemos esperar por en la parada' to Marcos.txt
Write 'y seguro que luego podemos esperar por en la parada' to Lucy.txt

Correct output for example 2 :正确 output例如 2

Write 'ya esta en la parada' to Adrian.txt
Write 'alli probablemente esten en la parada esperandonos' to Lucy.txt
Write 'alli probablemente esten en la parada esperandonos' to May.txt

I was trying with this other regex but I still have problems with this code:我正在尝试使用其他正则表达式,但这段代码仍然存在问题:

import re

sense = "puede ser peligroso ir solas, quizas sea mejor ir con Adrian y seguro que luego podemos esperar por Melisa, Marcos y Lucy en la parada"

person_identify_pattern = r"\s*(?:,|;|.|y |con |juntos a |junto a |en compania de |)\s*((?:\w\s*)+)\s*(?<=con|por|a, | y )\s*([A-Z].*?\b)\s*((?:\w\s*)+)\s*(?:,|;|.|y |con |juntos a |junto a |en compania de )\s*"

for m in re.split(person_identify_pattern, sense):
    m = m.strip()
    if m:
        try:
            print(f"Write '{content}' to {m}.txt")
        except:
            content = m

But I keep getting this wrong output但我一直犯这个错误 output

Write 'puede ser peligroso ir solas' to quizas sea mejor ir con Adrian y seguro que luego podemos esperar por.txt
Write 'puede ser peligroso ir solas' to Melisa,.txt
Write 'puede ser peligroso ir solas' to Marcos y Lucy en la parad.txt
import re

sense = "puede ser peligroso ir solas, quizas sea mejor ir con Adrian y seguro que luego podemos esperar por Melisa, Marcos y Lucy en la parada"
if match := re.findall(r"(?<=con|por|a, | y )\s*([A-Z].*?\b)", sense):
    print(match)

it result = ['Adrian', 'Melisa', 'Marcos', 'Lucy']结果 = ['Adrian', 'Melisa', 'Marcos', 'Lucy']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当且仅当前一个字母不是大写字母时,才如何在大写字母前插入空格? - How do I insert space before capital letter if and only if previous letter is not capital? 正则表达式 - 如果某个单词出现在所需单词之后,则排除模式 - Regex - Exclude pattern if a certain word appears after the desired word Python Regex - 检查大写字母后面的大写字母 - Python Regex - checking for a capital letter with a lowercase after 正则表达式灾难性的回溯; 提取单词以大写字母开头,然后是特定单词 - regex catastrophic backtracking ; extracting words starts with capital before the specific word 当且仅当前一个字母也不也是大写的Pythonic方式才能在大写字母前添加空间 - Pythonic Way to Add Space Before Capital Letter If and Only If Previous Letter is Not Also Capital 如何改变python中大写字母的单词的第i个字母? - how to change ith letter of a word in capital letter in python? 正则表达式在 &#39;.&#39; 之后分割这个字符串如果后面有一个大写字母 [AZ] - Regex to split this string after '.' if there is a capital letter [A-Z] after it 正则表达式以一行中的CAPITAL词开始和结束,在CAPITAL单行词中的多行 - Regex starts and ends with CAPITAL word in a line, several lines amid CAPITAL single-line words 大写字母单词计数python - Capital letter word count python 我想在大写字母开始的地方分隔一个字符串,但如果它前面有一个连字符,则在 python 中使用正则表达式 - I want to separate a string at the point where a capital letter starts but not if its preceded by a hyphen using regex in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM