简体   繁体   English

如何在Python中提取字符串的某些部分?

[英]How do I extract certain parts of strings in Python?

Say I have three strings: 说我有三个字符串:

abc534loif abc534loif

tvd645kgjf tvd645kgjf

tv96fjbd_gfgf tv96fjbd_gfgf

and three lists: 和三个列表:

  • beginning captures just the first part of the string "the name" beginning仅捕获字符串“名称”的第一部分
  • middle captures just the number middle只是数字
  • end contains only the rest of the characters that are after the number portion end仅包含数字部分之后的其余字符

How do I accomplish this in the most efficent way? 如何以最有效的方式完成此任务?

Use regular expressions? 使用正则表达式?

>>> import re
>>> strings = 'abc534loif tvd645kgjf tv96fjbd_gfgf'.split()
>>> for s in strings:
...   for match in re.finditer(r'\b([a-z]+)(\d+)(.+?)\b', s):
...     print match.groups()
... 
('abc', '534', 'loif')
('tvd', '645', 'kgjf')
('tv', '96', 'fjbd_gfgf')

This is language agnostic approach that aims at higher efficiency: 这是与语言无关的方法,旨在提高效率:

  1. find first digit in the string and save its position p0 在字符串中找到第一个数字并保存其位置p0
  2. find last digit in the string and save its position p1 在字符串中找到最后一位并保存其位置p1
  3. extract substring from 0 to p0-1 into beginning 0p0-1子字符串提取到beginning
  4. extract substring from p0 to p1 into middle p0p1子字符串提取到middle
  5. extract substring from p1+1 to length-1 into end p1+1length-1子字符串提取到end

I guess you're looking for re.findall : 我猜你在找re.findall

strs = """
    abc534loif
    tvd645kgjf
    tv96fjbd_gfgf
"""

import re
print re.findall(r'\b(\w+?)(\d+)(\w+)', strs)

>> [('abc', '534', 'loif'), ('tvd', '645', 'kgjf'), ('tv', '96', 'fjbd_gfgf')]
>>> import itertools as it
>>> s="abc534loif"
>>> [''.join(j) for i,j in it.groupby(s, key=str.isdigit)]
['abc', '534', 'loif']

I wouls use regualar expressions like: 我将使用像这样的可靠表达式:

(?P<beginning>[^0-9]*)(?P<middle>[^0-9]*)(?P<end>[^0-9]*)

and pull out the three matching sections. 并拉出三个匹配的部分。

import re 

m = re.match(r"(?P<beginning>[^0-9]*)(?P<middle>[^0-9]*)(?P<end>[^0-9]*)", "abc534loif")
m.group('beginning')
m.group('middle')
m.group('end')
import re #You want to match a string against a pattern so you import the regular expressions module 're'
mystring = "abc1234def" #Just a string to test with
match = re.match(r"^(\D+)([0)9]+](\D+)$") #Our regular expression. Everything between brackets is 'captured', meaning that it is accessible as one of the 'groups' in the returned match object. The ^ sign matches at the beginning of a string, while the $ matches the end. the characters in between the square brackets [0-9] are character ranges, so [0-9] matches any digit character, \D is any non-digit character.
if match: # match will be None if the string didn't match the pattern, so we need to check for that, as None.group doesn't exist.
    beginning = match.group(1)
    middle = match.group(2)
    end = match.group(3)

I'd something like this: 我会这样:

>>> import re
>>> l = ['abc534loif', 'tvd645kgjf', 'tv96fjbd_gfgf']
>>> regex = re.compile('([a-z_]+)(\d+)([a-z_]+)')
>>> beginning, middle, end = zip(*[regex.match(s).groups() for s in l])
>>> beginning
('abc', 'tvd', 'tv')
>>> middle
('534', '645', '96')
>>> end
('loif', 'kgjf', 'fjbd_gfgf')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM