简体   繁体   English

如何使用pythonic方式(一个线性)在多个模式上拆分字符串?

[英]How to split a string on multiple pattern using pythonic way (one liner)?

I am trying to extract file name from file pointer without extension. 我试图从没有扩展名的文件指针中提取文件名。 My file name is as follows: 我的文件名如下:

this site:time.list , this.list , this site:time_sec.list , that site:time_sec.list and so on. this site:time.listthis.listthis site:time_sec.listthat site:time_sec.list等等。 Here required file name always precedes either whitespace or dot. 此处所需的文件名始终位于空格或点之前。

Currently I am doing this to get file from file name preceding white space and dot in file name. 目前,我正在执行此操作以从文件名前的空格获取文件,并在文件名中添加点。

search_term = os.path.basename(f.name).split(" ")[0]

and

search_term = os.path.basename(f.name).split(".")[0]

Expected file name output: this , this , this , that . 预计文件名输出: thisthisthisthat

How can i combine above two into one liner kind and pythonic way? 我怎样才能将以上两种结合成一种衬里和蟒蛇的方式?

Thanks in advance. 提前致谢。

使用如下正则表达式, [ .]将在空格或点字符上分割

re.split('[ .]', os.path.basename(f.name))[0]

If you split on one and splitting on the other still returns something smaller, that's the one you want. 如果您拆分一个并拆分另一个仍然返回较小的值,那就是您想要的。 If not, what you get is what you got from the first split. 如果没有,您得到的就是您第一次分裂得到的。 You don't need regex for this. 您不需要正则表达式。

search_term = os.path.basename(f.name).split(" ")[0].split(".")[0]

Use regex to get the first word at the beginning of the string: 使用正则表达式获取字符串开头的第一个单词:

import re

re.match(r"\w+", "this site:time_sec.list").group()
# 'this'

re.match(r"\w+", "this site:time.list").group()
# 'this'

re.match(r"\w+", "that site:time_sec.list").group()
# 'that'

re.match(r"\w+", "this.list").group()
# 'this'

try this: 尝试这个:

pattern = re.compile(r"\w+")
pattern.match(os.path.basename(f.name)).group()

Make sure your filenames don't have whitespace inside when you rely on the assumption that a whitespace separates what you want to extract from the rest. 当您基于以下假设时,请确保文件名内部没有空格:将空白与要提取的内容分开。 It's much more likely to get unexpected results you didn't think up in advance if you rely on implicit rules like that instead of actually looking at the strings you want to extract and tailor explicit expressions to fit the content. 如果您依靠这样的隐式规则,而不是实际查看要提取和定制显式表达式以适合内容的字符串,则很有可能会获得您未曾想过的意外结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM