如何在Python中提取字符串的某些部分？

Question

Say I have three strings: 说我有三个字符串：

abc534loif abc534loif

tvd645kgjf tvd645kgjf

tv96fjbd_gfgf tv96fjbd_gfgf

and three lists: 和三个列表：

beginning captures just the first part of the string "the name" beginning仅捕获字符串“名称”的第一部分
middle captures just the number middle只是数字
end contains only the rest of the characters that are after the number portion end仅包含数字部分之后的其余字符

How do I accomplish this in the most efficent way? 如何以最有效的方式完成此任务？

Answer 1

Use regular expressions? 使用正则表达式？

>>> import re
>>> strings = 'abc534loif tvd645kgjf tv96fjbd_gfgf'.split()
>>> for s in strings:
...   for match in re.finditer(r'\b([a-z]+)(\d+)(.+?)\b', s):
...     print match.groups()
... 
('abc', '534', 'loif')
('tvd', '645', 'kgjf')
('tv', '96', 'fjbd_gfgf')

Answer 2

This is language agnostic approach that aims at higher efficiency: 这是与语言无关的方法，旨在提高效率：

find first digit in the string and save its position p0 在字符串中找到第一个数字并保存其位置p0
find last digit in the string and save its position p1 在字符串中找到最后一位并保存其位置p1
extract substring from 0 to p0-1 into beginning 从0到p0-1子字符串提取到beginning
extract substring from p0 to p1 into middle 从p0到p1子字符串提取到middle
extract substring from p1+1 to length-1 into end 从p1+1到length-1子字符串提取到end

Answer 3

I guess you're looking for re.findall : 我猜你在找re.findall ：

strs = """
    abc534loif
    tvd645kgjf
    tv96fjbd_gfgf
"""

import re
print re.findall(r'\b(\w+?)(\d+)(\w+)', strs)

>> [('abc', '534', 'loif'), ('tvd', '645', 'kgjf'), ('tv', '96', 'fjbd_gfgf')]

Answer 4

>>> import itertools as it
>>> s="abc534loif"
>>> [''.join(j) for i,j in it.groupby(s, key=str.isdigit)]
['abc', '534', 'loif']

Answer 5

I wouls use regualar expressions like: 我将使用像这样的可靠表达式：

(?P<beginning>[^0-9]*)(?P<middle>[^0-9]*)(?P<end>[^0-9]*)

and pull out the three matching sections. 并拉出三个匹配的部分。

import re 

m = re.match(r"(?P<beginning>[^0-9]*)(?P<middle>[^0-9]*)(?P<end>[^0-9]*)", "abc534loif")
m.group('beginning')
m.group('middle')
m.group('end')

Answer 6

import re #You want to match a string against a pattern so you import the regular expressions module 're'
mystring = "abc1234def" #Just a string to test with
match = re.match(r"^(\D+)([0)9]+](\D+)$") #Our regular expression. Everything between brackets is 'captured', meaning that it is accessible as one of the 'groups' in the returned match object. The ^ sign matches at the beginning of a string, while the $ matches the end. the characters in between the square brackets [0-9] are character ranges, so [0-9] matches any digit character, \D is any non-digit character.
if match: # match will be None if the string didn't match the pattern, so we need to check for that, as None.group doesn't exist.
    beginning = match.group(1)
    middle = match.group(2)
    end = match.group(3)

Answer 7

I'd something like this: 我会这样：

>>> import re
>>> l = ['abc534loif', 'tvd645kgjf', 'tv96fjbd_gfgf']
>>> regex = re.compile('([a-z_]+)(\d+)([a-z_]+)')
>>> beginning, middle, end = zip(*[regex.match(s).groups() for s in l])
>>> beginning
('abc', 'tvd', 'tv')
>>> middle
('534', '645', '96')
>>> end
('loif', 'kgjf', 'fjbd_gfgf')

如何在Python中提取字符串的某些部分？

问题描述

7 个解决方案

解决方案1
2 已采纳 2012-03-07 22:51:14

解决方案2
1 2012-03-07 22:58:48

解决方案3
1 2012-03-07 23:13:41

解决方案4
1 2012-03-07 23:18:26

解决方案5
0 2012-03-07 22:52:30

解决方案6
0 2012-03-07 22:53:16

解决方案7
0 2012-03-07 22:56:34

如何在Python中提取字符串的某些部分？

问题描述

7 个解决方案

解决方案1 2 已采纳 2012-03-07 22:51:14

解决方案2 1 2012-03-07 22:58:48

解决方案3 1 2012-03-07 23:13:41

解决方案4 1 2012-03-07 23:18:26

解决方案5 0 2012-03-07 22:52:30

解决方案6 0 2012-03-07 22:53:16

解决方案7 0 2012-03-07 22:56:34

解决方案1
2 已采纳 2012-03-07 22:51:14

解决方案2
1 2012-03-07 22:58:48

解决方案3
1 2012-03-07 23:13:41

解决方案4
1 2012-03-07 23:18:26

解决方案5
0 2012-03-07 22:52:30

解决方案6
0 2012-03-07 22:53:16

解决方案7
0 2012-03-07 22:56:34