简体   繁体   English

如何将字符串拆分为特定的关键字?

[英]How to split string into specific keywords?

I am trying to split a string into specific keywords.我正在尝试将字符串拆分为特定的关键字。 I have a list of key words/characters.我有一个关键词/字符列表。

for example: I have a list of keywords {'1', '2', '3', '4', '5', 'let', 'while'}例如:我有一个关键字列表{'1', '2', '3', '4', '5', 'let', 'while'}

and I have a string let2while4我有一个字符串let2while4

I want to output a list that contains {'let', '2', while', '4'}我想 output 一个包含{'let', '2', while', '4'}列表

Is this possible?这可能吗? I currently only have it split using a delimiter with ' '我目前只使用带有 ' ' 的分隔符将其拆分

Thank you!谢谢!

EDIT: Using Gilch's answer below works for the example below, but when I put in my full keywords, I am getting these errors:编辑:使用下面的 Gilch 的答案适用于下面的示例,但是当我输入完整的关键字时,我收到了这些错误:

Traceback (most recent call last):
File "parser.py", line 14, in <module>
list = re.findall(f"({'|'.join(keywords)})", input)
File "/usr/lib/python3.7/re.py", line 223, in findall
File "/usr/lib/python3.7/sre_parse.py", line 816, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "/usr/lib/python3.7/sre_parse.py", line 426, in _parse_sub
not nested and not items))
File "/usr/lib/python3.7/sre_parse.py", line 651, in _parse
source.tell() - here + len(this))
re.error: nothing to repeat at position 17

My full keywords include:我的完整关键字包括:

keywords = {'1','2','3','4','5','6','7','8','9','0','x','y','z','+','-','*','>','(',')',';','$','let','while','else','='}关键字 = {'1','2','3','4','5','6','7','8','9','0','x','y' ,'z','+','-','*','>','(',')',';','$','let','while','else',' ='}

Use '|'.join() to make a regex pattern from your keywords.使用'|'.join()从您的关键字中创建一个正则表达式模式。

>>> keywords = {'1', '2', '3', '4', '5', 'let', 'while'}
>>> string = 'let2while4'
>>> import re
>>> re.findall('|'.join(keywords), string)
['let', '2', 'while', '4']
>>> set(_)
{'let', '2', 'while', '4'}

If your keywords might contain regex control characters, you can use re.escape() on them before the join.如果您的关键字可能包含正则表达式控制字符,您可以在加入之前对它们使用re.escape()

>>> re.findall('|'.join(map(re.escape, keywords)), string)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM