[英]Split a string with multiple delimiters, and keep *some* of the delimiters, but not all
I have a string that can look something like this:我有一个看起来像这样的字符串:
1. "foo bar"
2. "foo bar foo:bar"
3. "foo bar "
4. "foo bar "
5. "foo bar foo:bar:baz"
I want to split this string so that it would end up with the following results:我想拆分这个字符串,以便它最终得到以下结果:
1. ['foo', 'bar']
2. ['foo', 'bar', 'foo', ':', 'bar']
3. / 4. ['foo', 'bar', '']
5. ['foo', 'bar', 'foo', ':', 'bar', ':', 'baz']
In other words, following these rules:换句话说,遵循以下规则:
Split the string on every occurrence of a space.在每次出现空格时拆分字符串。
a.一个。 If there are one or more spaces at the end of a string, add one empty string to the split list
如果字符串末尾有一个或多个空格,则在拆分列表中添加一个空字符串
b.湾。 Any spaces before the last non-space character in a string should be consumed, and not add to the split list.
字符串中最后一个非空格字符之前的任何空格都应该被消耗掉,而不是添加到拆分列表中。
Split the string on every occurrence of a colon, and do not consume the colon.在每次出现冒号时拆分字符串,并且不要使用冒号。
The XY problem is this, in case it's relevant: XY问题是这样的,如果它是相关的:
I want to mimic Bash tab-completion behaviour.我想模仿 Bash 制表符完成行为。 When you type a command into a Bash interpreter, it will split the command into an array
COMP_WORDS
, and it will follow the above rules - splitting the words based on spaces and colons, with colons placed into their own array element, and spaces ignored unless they're at the end of a string.当您在 Bash 解释器中键入命令时,它会将命令拆分为数组
COMP_WORDS
,并且它将遵循上述规则 - 根据空格和冒号拆分单词,冒号放置在自己的数组元素中,空格忽略,除非它们位于字符串的末尾。 I want to recreate this behaviour in Python, given a string that looks like a command that a user would type.我想在 Python 中重新创建此行为,给定一个看起来像用户键入的命令的字符串。
I've seen this question about splitting a string and keeping the separators using re.split
.我见过这个关于拆分字符串并使用
re.split
保留分隔符的问题。 And this question about splitting using multiple delimiters.还有这个关于使用多个分隔符进行拆分的问题。 But my use case is more complicated, and neither question seems to cover it.
但我的用例更复杂,似乎两个问题都没有涵盖它。 I tried the following to at least split on spaces and colons:
我尝试了以下至少在空格和冒号上拆分:
print(re.split('(:)|(?: )', splitstr))
But even that doesn't work.但即使这样也行不通。 When
splitstr
is "foo bar foo:bar" returns this:当
splitstr
为 "foo bar foo:bar" 时返回:
['foo', None, 'bar', None, 'foo', ':', 'bar']
Any idea how this could be done in Python?知道如何在 Python 中做到这一点吗?
EDIT: My requirements weren't clear - I would want "foo bar " (with any number of spaces at the end) to return the list ["foo", "bar", ""]
(with just one empty string at the end of the list.)编辑:我的要求不明确 - 我希望“foo bar”(末尾有任意数量的空格)返回列表
["foo", "bar", ""]
(只有一个空字符串列表的末尾。)
There is no need to use regular expressions for this task.此任务无需使用正则表达式。 String methods work just as well, and might be more readable.
字符串方法同样有效,并且可能更具可读性。
def split_comp(s: str) -> 'list[str]':
trailing = s.endswith(' ')
s = s.replace(':', ' : ') # insert split marks before/after every colon
parts = s.split()
return parts if not trailing else [*parts, ' ']
This technique can be used for any delimiters – pick one delimiter to split on, then replace/pad those to remove/keep with it.此技术可用于任何分隔符 - 选择一个分隔符进行拆分,然后替换/填充那些以删除/保留它。
You can use a re.findall
approach here with:您可以在此处使用
re.findall
方法:
[^:\s]+|:|(?<=\S)(?=\s+$)
See the regex demo .请参阅正则表达式演示。 Details :
详情:
[^:\s]+
- one or more chars other than whitespace and :
[^:\s]+
- 一个或多个字符,而不是空格和:
|
- or :
- a colon :
- 一个冒号|
- or (?<=\S)(?=\s+$)
- any empty string that is located between a non-whitespace and one or more whitespaces at the end of string. (?<=\S)(?=\s+$)
- 位于非空格和字符串末尾的一个或多个空格之间的任何空字符串。 See the Python demo .请参阅Python 演示。
import re
l = ['foo bar', 'foo bar foo:bar', 'foo bar ', 'foo bar ']
rx = re.compile(r'[^:\s]+|:|(?<=\S)(?=\s+$)')
for s in l:
if s.rstrip() != s:
s = s.rstrip() + " "
print(f"'{s}'", '=>', rx.findall(s))
Output: Output:
'foo bar' => ['foo', 'bar']
'foo bar foo:bar' => ['foo', 'bar', 'foo', ':', 'bar']
'foo bar ' => ['foo', 'bar', '']
'foo bar ' => ['foo', 'bar', '']
Maybe there are shorter ways, but here is my suggestion:也许有更短的方法,但这是我的建议:
def func(s):
if s[-1]==' ':
l=s.split()+['']
else:
l=s.split()
def f(l):
m=l.copy()
res=[]
for i in m:
if i!=':' and ':' in i:
temp=[i[:i.find(':')]]+[':']+[i[i.find(':')+1:]]
res.extend(temp)
else:
res.append(i)
return res
while any(i!=':' and ':' in i for i in l):
l=f(l)
return l
Examples:例子:
>>> func("foo bar")
['foo', 'bar']
>>> func("foo bar foo:bar")
['foo', 'bar', 'foo', ':', 'bar']
>>> func("foo bar ")
['foo', 'bar', '']
>>> func("foo bar ")
['foo', 'bar', '']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.