简体   繁体   English

如何按某些字符拆分 python 中的字符串?

[英]How to split a string in python by certain characters?

I am trying to solve a problem with prefix notation, but I am stuck on the part, where I want to split my string into an array: If I have the input +22 2 I want to get the array to look like this: ['+', '22', '2'] I tried using the我正在尝试解决前缀表示法的问题,但我被困在我想将字符串拆分为数组的部分:如果我有输入+22 2我想让数组看起来像这样: ['+', '22', '2']我尝试使用

import re 

function, but I am not sure how it works. function,但我不确定它是如何工作的。 I tried the我试过了

word.split(' ')

method, but it only helps with the spaces.. any ideas?方法,但它只对空间有帮助..有什么想法吗? PS: In the prefix notation I will also have + - and *. PS:在前缀符号中,我还将有 + - 和 *。 So I need to split the string so the space is not in the array, and +, -, * is in the array I am thinking of所以我需要拆分字符串,所以空间不在数组中,并且+,-,*在我正在考虑的数组中

word = input()
array = word.split(' ')

Then after that I am thinking of splitting a string by these 3 characters.然后,我想用这 3 个字符分割一个字符串。

Sample input: '+-12 23*67 1'样本输入: '+-12 23*67 1'

Output: ['+', '-', '12', '23', '*', '67', '1'] Output: ['+', '-', '12', '23', '*', '67', '1']

You can use re to find patterns in text, it seems you are looking for either one of these: + , - and * or a digit group.您可以使用re在文本中查找模式,您似乎正在寻找以下其中之一: +-*或数字组。 So compile a pattern that looks for that and find all that match this pattern and you will get a list:所以编译一个寻找它的模式并找到所有匹配这个模式的东西,你会得到一个列表:


import re

pattern = re.compile(r'([-+*]|\d+)')

string = '+-12 23*67 1'
array = pattern.findall(string)
print(array)

# Output:
# ['+', '-', '12', '23', '*', '67', '1']

Also a bit of testing (comparing your sample strings with the expected output):还有一些测试(将您的示例字符串与预期输出进行比较):

test_cases = {
    '+-12 23*67 1': ['+', '-', '12', '23', '*', '67', '1'],
    '+22 2': ['+', '22', '2']
}

for string, correct in test_cases.items():
    assert pattern.findall(string) == correct

print('Tests completed successfully!')

Pattern explanation (you can read about this in the docs linked below):模式解释(您可以在下面链接的文档中阅读相关内容):
r'([-+*]|\d+)'
r in front to make it a raw string so that Python interprets all the characters literally, this helps with escape sequences in the regex pattern because you can escape them with one backslash r在前面使其成为原始字符串,以便 Python 从字面上解释所有字符,这有助于正则表达式模式中的转义序列,因为您可以使用一个反斜杠对其进行转义
(...) parentheses around (they are not necessary in this case) indicate a group which can later be retrieved if needed (but in this case they don't matter much) (...)括号(在这种情况下它们不是必需的)表示稍后可以在需要时检索的组(但在这种情况下它们并不重要)
[...] indicates that any single character from this group can be matched so it will match if any of - , + and * will be present [...]表示该组中的任何单个字符都可以匹配,因此如果存在-+*中的任何一个,它将匹配
| logical or , meaning that can match either side (to differentiate between numbers and special characters in this case)逻辑or ,表示可以匹配任一侧(在这种情况下区分数字和特殊字符)
\d special escape sequence for digits, meaning to match any digit, the + there indicates matching any one or more digits \d数字的特殊转义序列,表示匹配任何数字, +表示匹配任何一个或多个数字

Useful:有用:

  • re module , the docs there explain what each character in the pattern does re模块,那里的文档解释了模式中每个字符的作用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM