[英]Matching arbitrary player names
我正在使用云雀,但不知道如何匹配所有玩家的名字,因为他们可能对云雀规则很复杂?
示例模式是"Seat {number}: {player} ({chips} in chips)"
,我也想要每一行的所有值。
from lark import Lark, lexer
gram = r"""
start: seats+
seats: "Seat " seat_position player table_chips is_sitting_out? SPACE? NL
seat_position: NUMBER
is_sitting_out: " is sitting out"
table_chips: "(" chips " in chips" (", " chips " bounty")? ")"
player: STRING_INNER
chips: "$"? (DECIMAL | NUMBERS)
NUMBERS: /[0-9]/+
NUMBER: /[0-9]/
DECIMAL: NUMBERS "." NUMBERS
SPACE: " "+
STRING_INNER: ": " /.*?/ " "
CR : /\r/
LF : /\n/
NL: (CR? LF)+
"""
data = r"""Seat 1: Ruzzka(Rus) (1200 in chips)
Seat 1: Dladik Rzs38 (1200 in chips)
Seat 1: slum ^o_o^ (1200 in chips)
Seat 1: é=mc² (1200 in chips)
Seat 1: {’O_0`}/(nh) (1200 in chips)
Seat 1: °ÆND0c42Z4y° (1200 in chips)
Seat 1: $ salesovish (1200 in chips)
"""
parser = Lark(gram)
tree = parser.parse(data)
print(tree.pretty())
问题是名称结尾显然没有真正的规则,因此很难解析它,因为 Lark 大多是非回溯高速的。
我实际上猜想直接在每一行上使用正则表达式会更容易,除非您还需要解析比您在此处显示的更复杂的结构。 但是 Lark 能够处理这种任意内容, 例如 here ,但性能损失很大。
这里没有百灵鸟的解决方案:
import re
regex = re.compile(r"Seat\s*(?P<number>\d+)\s*:\s*(?P<player>[^\n]+?)\s+\((?P<chips>\d+) in chips\)")
seats = []
for line in data.splitlines():
match = regex.match(line)
if match is not None:
values = match.groupdict()
seats.append((values["number"], values["player"], values["chips"]))
print(seats)
从您的语法看来,您实际上需要提取更多信息(例如is_sitting_out
和bounty
)。 为此,您可以将正则表达式稍微更改为:
Seat\s*(?P<number>\d+)\s*:\s*(?P<player>[^\n]+?)\s+\((?P<chips>\d+) in chips\s*(?:,\s*(?P<bounty>\d+)\s*bounty)?\)(?P<is_sitting_out> is sitting out)?
您可以通过values['is_sitting_out'] is not None
检查玩家是否坐在外面,如果没有赏金,则values['bounty']
将为 None。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.