[英]Splitting String with Multiple Delimiters in a Particular Order
I am dealing with a type of ASCII file where there are effectively 4 columns of data and the each row is assigned to a line in the file. 我正在处理一种ASCII文件,其中实际上有4列数据,并且每一行都分配给文件中的一行。 Below is an example of a row of data from this file
以下是该文件中一行数据的示例
'STOP.F 11966.0000:STOP DEPTH'
The data is always structured so that the delimiter between the first and second column is a period, the delimiter between the second and third column is a space and the delimiter between the third and fourth column is a colon. 数据的结构总是这样,第一列和第二列之间的定界符是一个句点,第二列和第三列之间的定界符是一个空格,第三列和第四列之间的定界符是一个冒号。
Ideally, I would like to find a way to return the following result from the string above 理想情况下,我想找到一种从上面的字符串返回以下结果的方法
['STOP', 'F', '11966.0000', 'STOP DEPTH']
I tried using a regular expression with the period, space and colon as delimiters, but it breaks down (see example below) because I don't know how to specify the specific order in which to split the string, and I don't know if there is a way to specify the maximum number of splits per delimiter right in the regular expression itself. 我尝试使用以句点,空格和冒号作为定界符的正则表达式,但由于它不知道如何指定分割字符串的特定顺序,因此它分解了(请参见下面的示例),我也不知道是否有办法在正则表达式本身中指定每个定界符的最大分割数。 I want it to split the delimiters in the specific order and each delimiter a maximum of 1 time.
我希望它按特定顺序分割定界符,每个定界符最多1次。
import re
line = 'STOP.F 11966.0000:STOP DEPTH'
re.split("[. :]", line)
>>> ['STOP', 'F', '11966', '0000', 'STOP', 'DEPTH']
Any suggestions on a tidy way to do this? 有什么建议吗?
re.split()
solution with specific regex pattern: 具有特定正则表达式模式的
re.split()
解决方案:
import re
s = 'STOP.F 11966.0000:STOP DEPTH'
result = re.split(r'(?<=^[^.]+)\.|(?<=^[^ ]+) |:', s)
print(result)
The output: 输出:
['STOP', 'F', '11966.0000', 'STOP DEPTH']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.