[英]Can't figure out regex match for list
I'm not sure where to begin figuring out how to pull just the team names out of the small snippet of list below. 我不确定从哪里开始弄清楚如何仅将团队名称从下面的清单中提取出来。 There seems to be so much variation.
似乎有太多变化。 Obviously, there is a single space preceding all teams names.
显然,所有团队名称前都有一个空格。 But they are not fixed length names and some have hyphens, apostrophes, and spaces inside of the team name themselves.
但是它们不是固定长度的名称,并且在团队名称本身内部带有连字符,撇号和空格。 There is always at least one space after the last word of the team and before either the single "A" or a double "AA" letters at the end.
团队的最后一个词之后和结尾的单个“ A”或双“ AA”字母之前始终至少有一个空格。
  1 Clemson A =
  5 Ohio State A =
 155 Tennessee-Martin AA =
 152 Louisiana-Monroe A =
 104 Hawai'i A =
 193 VMI AA =
 202 Stephen F. Austin AA =
Any Regex guys want to take a crack at this? 任何正则表达式的人都想对此进行破解吗?
That's relatively easy: 这是相对容易的:
import re
raw = """
  1 Clemson A =
  5 Ohio State A =
 155 Tennessee-Martin AA =
 152 Louisiana-Monroe A =
 104 Hawai'i A =
 193 VMI AA =
 202 Stephen F. Austin AA =
"""
teams = re.findall(r" \s*\d+\s+(.*?)\s+A+\s+=", raw)
for team in teams:
print(team)
# Clemson
# Ohio State
# Tennessee-Martin
# Louisiana-Monroe
# Hawai'i
# VMI
# Stephen F. Austin
How about something like this? 这样的事情怎么样? No regex required.
无需正则表达式。
lines
is a list of strings, where each string is a line from your data. lines
是字符串列表,其中每个字符串都是数据中的一行。
for line in lines:
splits = line.split(" ")
teamName = splits[1]
if hasNumbers(teamName):
teamName = splits[2]
print(teamName)
def hasNumbers(inputString):
return any(char.isdigit() for char in inputString)
Try using the following regex: 尝试使用以下正则表达式:
\d\s+(.*?)\s+=
- \d match digit
- \s+ followed by one or more space
- (.*) anything
- \s+ followed by one or more spaces
- = followed by `=`
The captured group will give you team name 捕获的组将为您提供团队名称
Edit if A/AA isn't part of team name do: 编辑如果A / AA不是球队的名字的一部分信息:
\d\s+(.*?)\s+[A]+\s+=
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.