简体   繁体   English

无法找出正则表达式匹配列表

[英]Can't figure out regex match for list

I'm not sure where to begin figuring out how to pull just the team names out of the small snippet of list below. 我不确定从哪里开始弄清楚如何将团队名称从下面的清单中提取出来。 There seems to be so much variation. 似乎有太多变化。 Obviously, there is a single space preceding all teams names. 显然,所有团队名称前都有一个空格。 But they are not fixed length names and some have hyphens, apostrophes, and spaces inside of the team name themselves. 但是它们不是固定长度的名称,并且在团队名称本身内部带有连字符,撇号和空格。 There is always at least one space after the last word of the team and before either the single "A" or a double "AA" letters at the end. 团队的最后一个词之后和结尾的单个“ A”或双“ AA”字母之前始终至少有一个空格。

&nbsp  1  Clemson              A  =
&nbsp  5  Ohio State           A  =
&nbsp155  Tennessee-Martin     AA =
&nbsp152  Louisiana-Monroe     A  =
&nbsp104  Hawai'i              A  =
&nbsp193  VMI                  AA =
&nbsp202  Stephen F. Austin    AA =

Any Regex guys want to take a crack at this? 任何正则表达式的人都想对此进行破解吗?

That's relatively easy: 这是相对容易的:

import re

raw = """
&nbsp  1  Clemson              A  =
&nbsp  5  Ohio State           A  =
&nbsp155  Tennessee-Martin     AA =
&nbsp152  Louisiana-Monroe     A  =
&nbsp104  Hawai'i              A  =
&nbsp193  VMI                  AA =
&nbsp202  Stephen F. Austin    AA =
"""

teams = re.findall(r"&nbsp\s*\d+\s+(.*?)\s+A+\s+=", raw)

for team in teams:
    print(team)

# Clemson
# Ohio State
# Tennessee-Martin
# Louisiana-Monroe
# Hawai'i
# VMI
# Stephen F. Austin

How about something like this? 这样的事情怎么样? No regex required. 无需正则表达式。

lines is a list of strings, where each string is a line from your data. lines是字符串列表,其中每个字符串都是数据中的一行。

for line in lines:
    splits = line.split(" ")
    teamName = splits[1]
    if hasNumbers(teamName):
        teamName = splits[2]

    print(teamName)


def hasNumbers(inputString):
    return any(char.isdigit() for char in inputString)

Try using the following regex: 尝试使用以下正则表达式:

\d\s+(.*?)\s+=

    - \d match digit
    - \s+ followed by one or more space
    - (.*) anything
    - \s+ followed by one or more spaces
    - = followed by  `=`

The captured group will give you team name 捕获的组将为您提供团队名称

Regex Demo 正则表达式演示

Edit if A/AA isn't part of team name do: 编辑如果A / AA不是球队的名字的一部分信息:

\d\s+(.*?)\s+[A]+\s+=

Updated Regex 更新的正则表达式

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM