[英]How to split a string with multiple delimiters without deleting delimiters in Python?
I currently have a list of filenames in a txt file and I am trying to sort them. 我目前在txt文件中有一个文件名列表,我正在尝试对它们进行排序。 The first this I am trying to do is split them into a list since they are all in a single line.
我要这样做的第一个方法是将它们分成一个列表,因为它们都在一行中。 There are 3 types of file types in the list.
列表中有3种文件类型。 I am able to split the list but I would like to keep the delimiters in the end result and I have not been able to find a way to do this.
我可以拆分列表,但是我想保留定界符到最终结果中,但我还没有找到一种方法来做到这一点。 The way that I am splitting the files is as follows:
我分割文件的方式如下:
import re
def breakLines():
unsorted_list = []
file_obj = open("index.txt", "rt")
file_str = file_obj.read()
unsorted_list.append(re.split('.txt|.mpd|.mp4', file_str))
print(unsorted_list)
breakLines()
I found DeepSpace's answer to be very helpful here Split a string with "(" and ")" and keep the delimiters (Python) , but that only seems to work with single characters. 在这里,我发现DeepSpace的答案非常有帮助, 使用“(”和“)”分割字符串并保留定界符(Python) ,但这似乎仅适用于单个字符。
EDIT: 编辑:
Sample input: 输入样例:
file_name1234.mp4file_name1235.mp4file_name1236.mp4file_name1237.mp4
file_name1234.mp4file_name1235.mp4file_name1236.mp4file_name1237.mp4
Expected output: 预期产量:
file_name1234.mp4
file_name1234.mp4
file_name1235.mp4
file_name1235.mp4
file_name1236.mp4
file_name1236.mp4
file_name1237.mp4
file_name1237.mp4
In re.split
, the key is to parenthesise the split pattern so it's kept in the result of re.split
. 在
re.split
,关键是括re.split
分割模式的括号,以便将其保留在re.split
的结果中。 Your attempt is: 您的尝试是:
>>> s = "file_name1234.mp4file_name1235.mp4file_name1236.mp4file_name1237.mp4"
>>> re.split('.txt|.mpd|.mp4', s)
['file_name1234', 'file_name1235', 'file_name1236', 'file_name1237', '']
okay that doesn't work (and the dots would need escaping to be really compliant with what an extension is), so let's try: 好的,这是行不通的(并且点必须转义以与扩展名真正兼容),所以让我们尝试:
>>> re.split('(\.txt|\.mpd|\.mp4)', s)
['file_name1234',
'.mp4',
'file_name1235',
'.mp4',
'file_name1236',
'.mp4',
'file_name1237',
'.mp4',
'']
works but this is splitting the extensions from the filenames and leaving a blank in the end, not what you want (unless you want an ugly post-processing). 可以,但是这是将扩展名与文件名分开,并在末尾留一个空白,而不是您想要的(除非您想要一个丑陋的后处理)。 Plus this is a duplicate question: In Python, how do I split a string and keep the separators?
加上这是一个重复的问题: 在Python中,如何分割字符串并保留分隔符?
But you don't want re.split
you want re.findall
: 但是,您不想
re.split
而是想要re.findall
:
>>> s = "file_name1234.mp4file_name1235.mp4file_name1236.mp4file_name1237.mp4"
>>> re.findall('(\w*?(?:\.txt|\.mpd|\.mp4))',s)
['file_name1234.mp4',
'file_name1235.mp4',
'file_name1236.mp4',
'file_name1237.mp4']
the expression matches word characters (basically digits, letters & underscores), followed by the extension. 表达式匹配单词字符(主要是数字,字母和下划线),后跟扩展名。 To be able to create a OR, I created a non-capturing group inside the main group.
为了能够创建OR,我在主组内创建了一个非捕获组。
If you have more exotic file names, you can't use \\w
anymore but it still reasonably works (you may need some str.strip
post-processing to remove leading/trailing blanks which are likely not part of the filenames): 如果您有更多的外来文件名,则不
str.strip
使用\\w
,但是它仍然可以正常工作(您可能需要进行一些str.strip
后处理,以删除可能不是文件名一部分的前导/尾随空格):
>>> s = " file name1234.mp4file-name1235.mp4 file_name1236.mp4file_name1237.mp4"
>>> re.findall('(.*?(?:\.txt|\.mpd|\.mp4))',s)
[' file name1234.mp4',
'file-name1235.mp4',
' file_name1236.mp4',
'file_name1237.mp4']
So sometimes you think re.split
when you need re.findall
, and the reverse is also true. 所以有时候你觉得
re.split
当你需要re.findall
,而反过来也是如此。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.