How do I extract certain parts of strings in Python?

Question

Say I have three strings:

abc534loif

tvd645kgjf

tv96fjbd_gfgf

and three lists:

beginning captures just the first part of the string "the name"
middle captures just the number
end contains only the rest of the characters that are after the number portion

How do I accomplish this in the most efficent way?

Answer 1

Use regular expressions?

>>> import re
>>> strings = 'abc534loif tvd645kgjf tv96fjbd_gfgf'.split()
>>> for s in strings:
...   for match in re.finditer(r'\b([a-z]+)(\d+)(.+?)\b', s):
...     print match.groups()
... 
('abc', '534', 'loif')
('tvd', '645', 'kgjf')
('tv', '96', 'fjbd_gfgf')

Answer 2

This is language agnostic approach that aims at higher efficiency:

find first digit in the string and save its position p0
find last digit in the string and save its position p1
extract substring from 0 to p0-1 into beginning
extract substring from p0 to p1 into middle
extract substring from p1+1 to length-1 into end

Answer 3

I guess you're looking for re.findall :

strs = """
    abc534loif
    tvd645kgjf
    tv96fjbd_gfgf
"""

import re
print re.findall(r'\b(\w+?)(\d+)(\w+)', strs)

>> [('abc', '534', 'loif'), ('tvd', '645', 'kgjf'), ('tv', '96', 'fjbd_gfgf')]

Answer 4

>>> import itertools as it
>>> s="abc534loif"
>>> [''.join(j) for i,j in it.groupby(s, key=str.isdigit)]
['abc', '534', 'loif']

Answer 5

I wouls use regualar expressions like:

(?P<beginning>[^0-9]*)(?P<middle>[^0-9]*)(?P<end>[^0-9]*)

and pull out the three matching sections.

import re 

m = re.match(r"(?P<beginning>[^0-9]*)(?P<middle>[^0-9]*)(?P<end>[^0-9]*)", "abc534loif")
m.group('beginning')
m.group('middle')
m.group('end')

Answer 6

import re #You want to match a string against a pattern so you import the regular expressions module 're'
mystring = "abc1234def" #Just a string to test with
match = re.match(r"^(\D+)([0)9]+](\D+)$") #Our regular expression. Everything between brackets is 'captured', meaning that it is accessible as one of the 'groups' in the returned match object. The ^ sign matches at the beginning of a string, while the $ matches the end. the characters in between the square brackets [0-9] are character ranges, so [0-9] matches any digit character, \D is any non-digit character.
if match: # match will be None if the string didn't match the pattern, so we need to check for that, as None.group doesn't exist.
    beginning = match.group(1)
    middle = match.group(2)
    end = match.group(3)

Answer 7

I'd something like this:

>>> import re
>>> l = ['abc534loif', 'tvd645kgjf', 'tv96fjbd_gfgf']
>>> regex = re.compile('([a-z_]+)(\d+)([a-z_]+)')
>>> beginning, middle, end = zip(*[regex.match(s).groups() for s in l])
>>> beginning
('abc', 'tvd', 'tv')
>>> middle
('534', '645', '96')
>>> end
('loif', 'kgjf', 'fjbd_gfgf')

How do I extract certain parts of strings in Python?

Question

7 answers

solution1
2 ACCPTED 2012-03-07 22:51:14

solution2
1 2012-03-07 22:58:48

solution3
1 2012-03-07 23:13:41

solution4
1 2012-03-07 23:18:26

solution5
0 2012-03-07 22:52:30

solution6
0 2012-03-07 22:53:16

solution7
0 2012-03-07 22:56:34

How do I extract certain parts of strings in Python?

Question

7 answers

solution1 2 ACCPTED 2012-03-07 22:51:14

solution2 1 2012-03-07 22:58:48

solution3 1 2012-03-07 23:13:41

solution4 1 2012-03-07 23:18:26

solution5 0 2012-03-07 22:52:30

solution6 0 2012-03-07 22:53:16

solution7 0 2012-03-07 22:56:34

solution1
2 ACCPTED 2012-03-07 22:51:14

solution2
1 2012-03-07 22:58:48

solution3
1 2012-03-07 23:13:41

solution4
1 2012-03-07 23:18:26

solution5
0 2012-03-07 22:52:30

solution6
0 2012-03-07 22:53:16

solution7
0 2012-03-07 22:56:34