简体   繁体   English

Python使用正则表达式和replace()查找某些字符之间的子字符串

[英]Python finding substring between certain characters using regex and replace()

Suppose I have a string with lots of random stuff in it like the following: 假设我有一个包含大量随机内容的字符串,如下所示:

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"

And I'm interested in obtaining the substring sitting between 'Value=' and '&', which in this example would be 'five'. 而且我有兴趣获得位于'Value ='和'&'之间的子串,在这个例子中它将是'5'。

I can use a regex like the following: 我可以使用如下的正则表达式:

 match = re.search(r'Value=?([^&>]+)', strJunk)
 >>> print match.group(0)
 Value=five
 >>> print match.group(1)
 five

How come match.group(0) is the whole thing 'Value=five' and group(1) is just 'five'? 为什么match.group(0)是整个'Value = five'而group(1)只是'five'? And is there a way for me to just get 'five' as the only result? 我有办法让'五'成为唯一的结果吗? (This question stems from me only having a tenuous grasp of regex) (这个问题源于我对正则表达式的一种微弱的把握)

I am also going to have to make a substitution in this string such such as the following: 我也将不得不在这个字符串中进行替换,如下所示:

 val1 = match.group(1)
 strJunk.replace(val1, "six", 1)    

Which yields: 产量:

 'asdf2adsf29Value=six&lakl23ljk43asdldl'

Considering that I plan on performing the above two tasks (finding the string between 'Value=' and '&', as well as replacing that value) over and over, I was wondering if there are any other more efficient ways of looking for the substring and replacing it in the original string. 考虑到我计划一遍又一遍地执行上述两个任务(在'Value ='和'&'之间找到字符串,以及替换该值),我想知道是否还有其他更有效的方法来寻找substring并在原始字符串中替换它。 I'm fine sticking with what I've got but I just want to make sure that I'm not taking up more time than I have to be if better methods are out there. 我很好地坚持我所拥有的,但我只是想确保如果有更好的方法,我不会占用更多的时间。

Named groups make it easier to get the group contents afterwards. 命名组使得之后更容易获得组内容。 Compiling your regex once, and then reusing the compiled object, will be much more efficient than recompiling it for each use (which is what happens when you call re.search repeatedly). 编译正则表达式一次,然后重用编译对象,将比为每次使用重新编译它更有效(这是重复调用re.search时会发生的情况)。 You can use positive lookbehind and lookahead assertions to make this regex suitable for the substitution you want to do. 您可以使用正向lookbehind和lookahead断言来使此正则表达式适合您要执行的替换。

>>> value_regex = re.compile("(?<=Value=)(?P<value>.*?)(?=&)")
>>> match = value_regex.search(strJunk)
>>> match.group('value')
'five'
>>> value_regex.sub("six", strJunk)
'asdf2adsf29Value=six&lakl23ljk43asdldl'

I'm not exactly sure if you're parsing URLs, in which case, you should be definitely using the urlparse module. 我不确定你是否正在解析URL,在这种情况下,你应该肯定使用urlparse模块。

However, given that this is not your question, the ability to split on multiple fields using regular expressions is extremely fast in Python, so you should be able to do what you want as follows: 但是,鉴于这不是您的问题,在Python中使用正则表达式拆分多个字段的能力非常快,因此您应该能够按照以下方式执行所需操作:

import re

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
split_result = re.split(r'[&=]', strJunk)
split_result[1] = 'six'
print "{0}={1}&{2}".format(*split_result)

Hope this helps! 希望这可以帮助!

EDIT: 编辑:

If you will split multiple times, you can use re.compile() to compile the regular expression. 如果要多次拆分,可以使用re.compile()编译正则表达式。 So you'll have: 所以你会有:

import re
rx_split_on_delimiters = re.compile(r'[&=]')  # store this somewhere

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
split_result = rx_split_on_delimiters.split(strJunk)
split_result[1] = 'six'
print "{0}={1}&{2}".format(*split_result)

How come match.group(0) is the whole thing 'Value=five' and group(1) is just 'five'? 为什么match.group(0)是整个'Value = five'而group(1)只是'five'? And is there a way for me to just get 'five' as the only result? 我有办法让'五'成为唯一的结果吗? (This question stems from me only having a tenuous grasp of regex) (这个问题源于我对正则表达式的一种微弱的把握)

I thought that look behind assertion can help you here. 我认为断言的背后可以帮助你。

>>> match = re.search(r'(?<=Value=)([^&>]+)', strJunk)
>>> match.group(0)
'five'

but you can only provide a constant length string in look behind assertion. 但是你只能在断言后面提供一个恒定长度的字符串。

>>> match = re.search(r'(?<=Value=?)([^&>]+)', strJunk)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/re.py", line 142, in search
    return _compile(pattern, flags).search(string)
  File "/usr/lib/python2.6/re.py", line 245, in _compile
    raise error, v # invalid expression
sre_constants.error: look-behind requires fixed-width pattern

I can't thing of a way to do this without regex. 没有正则表达式,我无法做到这一点。 Your way of doing this should be faster than look behind assertion. 你这样做的方式应该比断言后面更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM