简体   繁体   English

Python正则表达式子空间

[英]Python regex sub space

CODE: 码:

word = 'aiuhsdjfööäö ; sdfdfd'
word1=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]',"""\[^^0-9\t\r\n\f(!){$}.+?|\]*""", word) ; print 'word=  ', word
word2=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]',"""\[^^0-9\\t\\r\\n\\f(!){$}.+?|\]*""", word) ; print 'word=  ', word
word3=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]',"""\[^^0-9\\\t\\\r\\\n\\\f(!){$}.+?|\]*""", word) ; print 'word=  ', word
word4=re.sub('[^^äÄöÖåÅA-Za-z0-9\s()!{$}.+?|]',"""\[^^0-9\s(!){$}.+?|\]*""", word) ; print 'word=  ', word
word5=re.sub('[^^äÄöÖåÅA-Za-z0-9\s()!{$}.+?|]',"""\[^^0-9\\s(!){$}.+?|\]*""", word) ; print 'word=  ', word
word6=re.sub('[^^äÄöÖåÅA-Za-z0-9\s()!{$}.+?|]',"""\[^^0-9\\\s(!){$}.+?|\]*""", word) ; print 'word=  ', word

F=open('suoriP.txt','w')
F.writelines(word1+'\n\n'+word2+'\n\n'+word3+'\n\n'+word4+'\n\n'+word5+'\n\n'+word6)
F.close

RESULT: 结果:

aiuhsdjfööäö\[^^0-9 

(!){$}.+?|\]*\[^^0-9    

(!){$}.+?|\]*\[^^0-9    

(!){$}.+?|\]*sdfdfd

aiuhsdjfööäö\[^^0-9 

(!){$}.+?|\]*\[^^0-9    

(!){$}.+?|\]*\[^^0-9    

(!){$}.+?|\]*sdfdfd

aiuhsdjfööäö\[^^0-9\    \
\
\(!){$}.+?|\]*\[^^0-9\  \
\
\(!){$}.+?|\]*\[^^0-9\  \
\
\(!){$}.+?|\]*sdfdfd

aiuhsdjfööäö \[^^0-9\s(!){$}.+?|\]* sdfdfd

aiuhsdjfööäö \[^^0-9\s(!){$}.+?|\]* sdfdfd

aiuhsdjfööäö \[^^0-9\s(!){$}.+?|\]* sdfdfd

QUESTION: 题:

I do not understand why: 我不理解为什么:

  1. re does not substitute backslashes, \\s, \\s, \\\\s are all substituted as \\s re不替换反斜杠,\\ s,\\ s,\\\\ s都替换为\\ s

  2. re does not substitute \\\\t\\\\r\\\\n\\\\f for ';' re不会用\\\\ t \\\\ r \\\\ n \\\\ f代替';'

I am trying to generate complicated re patterns with variable names by analyzing a file. 我试图通过分析文件来生成带有变量名的复杂re模式。

I am not able to generate space characters representation [^^äÄöÖåÅA-Za-z0-9\\t\\r\\n\\f()!{$}.+?|] . 我无法生成空格字符表示形式[^^äÄöÖåÅA-Za-z0-9\\t\\r\\n\\f()!{$}.+?|] I mean if I find in the text file ';' 我的意思是如果我在文本文件中找到“;” with word1=re.sub('[^^äÄöÖåÅA-Za-z0-9\\t\\r\\n\\f()!{$}.+?|]',.... word1=re.sub('[^^äÄöÖåÅA-Za-z0-9\\t\\r\\n\\f()!{$}.+?|]',....

I am not able to substitute this character ';' 我无法替换此字符“;” by string '[^^äÄöÖåÅA-Za-z0-9\\t\\r\\n\\f()!{$}.+?|]' 通过字符串'[^^äÄööååA-Za-z0-9\\ t \\ r \\ n \\ f()!{$}。+?|]'

This string is a pattern string, which I use in re.search to extract certain words as variables. 这个字符串是一个模式字符串,我在re.search使用它来提取某些单词作为变量。

SOLUTION < WHICH EMERGED LATER AND IS ADDED LATER. 解决方案 <后来出现,以后又添加了。

In the end I replaced xxxx instead of space special characters. 最后,我替换了xxxx而不是空格特殊字符。 Later merged, split and merged string by adding '\\t\\n\\f\\v\\r'. 后来通过添加'\\ t \\ n \\ f \\ v \\ r'来合并,拆分和合并字符串。

strsub=smart_str('[^^äÄöÖåÅA-Za-z0-9xxxx()!{$}.+?|`\"£$\%&_+~#\'@><]+', encoding='utf-8', strings_only=False, errors='replace' )
word=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\n\r\f()!{$}.+?|£$\%&_+~#\'@><]+',strsub,word)

for line in word.split('xxxx'):
     str2=str2+'\\t\\n\\f\\v\\r'+line 
     F.writelines(str2)

When you use re.sub the second part won't be regex -- you simply should group it and call it in \\1 or \\2 for example: 当您使用re.sub ,第二部分将不是正则表达式-您只需将其分组并以\\1\\2进行调用,例如:

 word="aiuhsdjfööäö"
 word1=re.sub("(.+?)[äa](.+?)","\1a\2 [corrected]",word)

What I did above is completely unnecessary but I did it to show my point that using [ doesn't have to come after \\ when you use it as the second part of re.sub 我上面所做的工作完全没有必要,但是我这样做是为了表明我的观点,当您将[用作re.sub的第二部分时,不必在\\之后使用[

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM