简体   繁体   English

设置不接受某些特定符号的字母数字正则表达式模式

[英]Set alphanumeric regex pattern not accepting certain specific symbols

import re

#Examples:
input_text = "Recien el 2021-10-12 despues de 3 dias 2021-10-12" #NOT PASS
input_text = "Recien el 2021-10-12 hsah555sahsdhj. Ya despues de 3 dias hjsdfhjdsfhjdsf 2021-10-12" #NOT PASS
input_text = "Recien el 2021-10-12 hsah555sahsdhj; despues de 3 dias hjsdfhjdsfhjdsf 2021-10-12" #NOT PASS
input_text = "Recien el 2021-10-12 hsah555sahsdhj despues de 3 dias hjsdfhjdsfhjdsf.\n 2021-10-12" #NOT PASS
input_text = "Recien el 2021-10-12 hsah555sahsdhj; mmm... creo que ya despues de 3 dias hjsdfhjdsfhjdsf.\n 2021-10-12" #PASS
input_text = "Recien el 2021-10-12 hsah555sahsdhj.    \n\n\n mmm... creo que ya despues de 3 dias hjsdfhjdsfhjdsf.\n 2021-10-12" #PASS


some_text = r"[\s|]*"  # <--- I NEED MODIFY THIS PATTERN
date_format = r"\d*-\d{2}-\d{2}"

check_00 = re.search(date_format + some_text + r"(?:(?:pasados|pasado|despues del|despues de el|despues de|despues|tras) (\d+) (?:días|día|dias|dia)|(\d+) (?:días|día|dias|dia) (?:pasados|pasado|despues del|despues de el|despues de|despues|tras))", input_text, re.IGNORECASE)
check_01 = re.search(r"(?:(?:pasados|pasado|despues del|despues de el|despues de|despues|tras) (\d+) (?:días|día|dias|dia)|(\d+) (?:días|día|dias|dia) (?:pasados|pasado|despues del|despues de el|despues de|despues|tras))" + some_text + date_format, input_text, re.IGNORECASE)

if not check_00 and not check_01: print("1")
else: print("0")

I need to set in the variable some_text a pattern that identify any alphanumeric substrings (that could possibly contain symbols included, such as : , $ , # , & , ? , ¿ , ! , ¡ , | , ° , , , . , ( , ) , ] , [ , } , { ), and with the possibility of containing uppercase and lowercase characters, but the only symbols that should not to be present, not even once, are ;我需要在变量some_text中设置一个识别任何字母数字子字符串的模式(可能包含包含的符号,例如: , $ , # , & , ? , ¿ , ! , ¡ , | , ° , , , . , ( , ) , ] , [ , } , { ), 并且可能包含大写和小写字符,但唯一不应该出现的符号是; and .\n or .[\s|]*\n*.\n.[\s|]*\n*

In this case I need to determine which cases does NOT meet, therefore, the if not conditionals in the code.在这种情况下,我需要确定哪些情况不符合,因此,代码中的if not条件。

The output you should get if everything in the algorithm works fine would be this:如果算法中的一切正常,你应该得到的输出是这样的:

0  #for example 1
0  #for example 2
0  #for example 3
0  #for example 4
1  #for example 5
1  #for example 6

Is it possible, within the same pattern that I want to place in the some_text variable, to indicate a list with the symbols that I do NOT want to appear in that identification area of the pattern (in this case ; and .[\s|]*\n* )?是否有可能,在我想放置在some_text变量中的相同模式中,指示一个列表,其中包含我不想出现在模式的识别区域中的符号(在本例中为;.[\s|]*\n* )?

but the only symbols that should not to be present, not even once, are;但唯一不应该出现的符号是; and.\n or.[\s|] \n和.\n 或.[\s|] \n

For not allowing ;因为不允许; you can simply use [^;] .你可以简单地使用[^;]

Regarding the other two "patterns": the [\s|] pattern makes a wrong assumption: a pipe symbol inside a character class will be interpreted literally .关于其他两个“模式”: [\s|]模式做出了错误的假设:字符类中的管道符号将按字面解释。 It seems you want to indicate with it that the \s is optional, but the asterisk already ensures this.似乎您想用它来表示\s是可选的,但星号已经确保了这一点。 The point must be escaped.点必须转义。 So \.\s*?\n .所以\.\s*?\n But to dis allow it, you can put it in a negative look-ahead: (?.\?\s*?\n) .但是要禁止它,您可以将其置于否定的前瞻中: (?.\?\s*?\n)

This leads to:这将导致:

some_text = r"(?:(?!\.\s*?\n)[^;])*"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM