简体   繁体   English

正则表达式公式来查找其他两个字符串或字符之间的字符串

[英]Regex formula to find string between two other strings or characters

I am trying to extract some sub-strings from another string, and I have identified patterns that should yield the correct results, however I think there are some small flaws in my implementation. 我试图从另一个字符串中提取一些子字符串,并且已经确定了应该产生正确结果的模式,但是我认为我的实现中存在一些小缺陷。

s = 'Arkansas BaseballMiami (Ohio) at ArkansasFeb 17, 2017 at Fayetteville, Ark. (Baum Stadium)Score by Innings123456789RHEMiami (Ohio)000000000061Arkansas60000010X781Miami (Ohio) starters: 1/lf HALL, D.; 23/3b YACEK; 36/1b HAFFEY; 40/c  SENGER; 7/dh HARRIS; 8/rf STEPHENS; 11/ss TEXIDOR; 2/2b  VOGELGESANG; 5/cf SADA; 32/p GNETZ;Arkansas starters: 8/dh E. Cole; 9/ss J. Biggers; 17/lf L. Bonfield;  33/c G. Koch; 28/cf D. Fletcher; 20/2b C. Shaddy; 24/1b C  Spanberger; 15/rf J. Arledge; 6/3b H. Wilson; 16/p B. Knight;Miami (Ohio) 1st - HALL, D. struck out swinging.'

Here is my attempt at regex formulas to achieve my desired outputs: 这是我尝试使用正则表达式公式以实现所需的输出:

teams = re.findall(r'(;|[0-9])(.*?) starters', s)
pitchers = re.findall('/p(.*?);', s)

The pitchers search seems to work, however the teams outputs the following: 投手搜索似乎有效,但是小组输出以下内容:

[('1', '7, 2017 at Fayetteville, Ark. (Baum Stadium)Score by Innings123456789RHEMiami (Ohio)000000000061Arkansas60000010X781Miami (Ohio)'), ('1', '/lf HALL, D.; 23/3b YACEK; 36/1b HAFFEY; 40/c  SENGER; 7/dh HARRIS; 8/rf STEPHENS; 11/ss TEXIDOR; 2/2b  VOGELGESANG; 5/cf SADA; 32/p GNETZ;Arkansas')]

DESIRED OUTPUTS: 期望的输出:

['Miami (Ohio)', 'Arkansas']
[' GNETZ', ' B. Knight']

I can worry about stripping out the leading spaces in the pitchers names later. 我担心稍后会删除投手名称中的前导空格。

(;|[0-9]) can be replaced with [;0-9] . (;|[0-9])可以替换为[;0-9] Then what I think you're trying to express is "get me the string before starters and immediately after the last number/semicolon that comes before the starters ", for which you can say "there must be no other numbers/semicolons in between", ie 然后,我认为您要表达的是“在starters之前以及starters之前的最后一个数字/分号之后立即给我输入字符串”,对于该字符串,您可以说“中间不得有其他数字/分号” ,即

teams = re.findall(r'[;0-9]([^;0-9]*) starters', s)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM