简体   繁体   English

带有特殊字符的Python正则表达式

[英]Python Regular Expression with special characters

Having trouble writing a robust regular expression to grab information out of a string. 编写健壮的正则表达式以从字符串中获取信息时遇到麻烦。

$ string1 = 'A_XYZ_THESE_WORDS'
$ string2 = 'A_ABC_THOSE_WORDS'

I would like a robust solution that pulls out from string1 or string2 respectfully 'THESE_WORDS' or 'THOSE_WORDS'. 我想要一个健壮的解决方案,从string1或string2分别拉出'THESE_WORDS'或'THOSE_WORDS'。

Basically, I need something that removes everything before the first two underscores (_), but the text before them will vary. 基本上,我需要删除前两个下划线(_)之前的所有内容的内容,但它们之前的文本会有所不同。

$ get_text = re.search('(?<=A_)\w+(_)',string1)
$ print get_text.group()
$ 'XYZ_THESE_'

Based on your problem statement: 根据您的问题陈述:

I need something that removes everything before the first two underscores 我需要能够删除前两个下划线之前所有内容的内容

you don't necessarily need a regular expression: 您不一定需要使用正则表达式:

>>> string1 = 'A_XYZ_THESE_WORDS'
>>> string1.split("_", 2)[2]
'THESE_WORDS'

The second argument to str.split is the maximum number of times to split. str.split的第二个参数是最大拆分次数。 This will split on the first two '_' s, then take the third item (the rest of the string) from the resulting list. 这将在前两个'_'上分割,然后从结果列表中获取第三项(字符串的其余部分)。

This will throw an IndexError if there are fewer than two underscores in the string - this lets you know that the string is not in a format you expect, but if this behaviour is not desirable, consider: 如果字符串中的下划线少于两个,这将引发IndexError这使您知道该字符串不是您期望的格式,但是如果这种行为不受欢迎,请考虑:

>>> string1 = 'A_XYZ_THESE_WORDS'
>>> string1.split("_", 2)[-1]
'THESE_WORDS'  

Which takes the last item in the list from str.split , rather than assuming that there will be three. 它从str.split中获取列表中的最后一项,而不是假设会有三项。 Comparison: 比较:

>>> "JUST_ONE".split("_", 2)[2]
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    "JUST_ONE".split("_", 2)[2]
IndexError: list index out of range

>>> "JUST_ONE".split("_", 2)[-1]
'ONE'

The below regex will print the texts which was just after to the second underscore(_), 下面的正则表达式将打印第二个下划线(_)之后的文本,

>>> import re
>>> string1 = 'A_XYZ_THESE_WORDS'
>>> string2 = 'A_ABC_THOSE_WORDS'
>>> m = re.search(r'^[^_]*_[^_]*_(.*)$', string1)
>>> m.group(1)
'THESE_WORDS'
>>> m = re.search(r'^[^_]*_[^_]*_(.*)$', string2)
>>> m.group(1)
'THOSE_WORDS'
In [21]: regex = re.compile(r'^([a-zA-Z]+_){2}(.*)$')

In [22]: m = regex.search(string1)

In [23]: m.groups()
Out[23]: ('XYZ_', 'THESE_WORDS')

In [24]: m = regex.search(string2)

In [25]: m.groups()
Out[25]: ('ABC_', 'THOSE_WORDS')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python正则表达式也匹配特殊字符 - python regular expression also match special characters python正则表达式:如何只过滤特殊字符? - python regular expression : How can I filter only special characters? 如何在Python中使用正则表达式删除具有特殊字符串的字符? - How to remove characters with special strings using regular expression in Python? 正则表达式-特殊字符的字符类 - Regular Expression - character class for special characters 寻找正则表达式来转义特殊字符集 - Looking for regular expression to escape special set of characters 如何使用正则表达式搜索字符串以查找包含字母,特殊字符(如-,())的字符串(使用python) - How to search string using regular expression for string contains characters alphabets and special characters like -, () using python python正则表达式重复字符 - python regular expression repeated characters 如何在Python中编写一个接受字母,数字和一些选定的特殊字符(,.- |;!_?)的正则表达式? - How to write a regular expression in Python that accepts alphabets, numbers and a few selected special characters(,.-|;!_?)? 需要一个可以验证带有特殊字符(连字符、撇号等...)的名称的 python 正则表达式 - Need a python regular expression that can verify names with special characters(Hyphens, apostrophes, etc...) 正则表达式返回两个特殊字符之间的所有字符 - Regular expression to return all characters between two special characters
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM