简体   繁体   English

Python:过滤或搜索str列表时将范围应用于通配符(需要将任何没有10位数字的str列表项添加到列表中)

[英]Python: apply a range to wildcard when filtering or searching a str list (need to add any str list item that doesn't have a 10-digit number to a list)

If I have a list of Windows file paths (strings), how would I search for all list objects that have a consecutive 10-digit number in the file path --to add to a list?如果我有一个 Windows 文件路径(字符串)列表,我将如何搜索在文件路径中具有连续 10 位数字的所有列表对象——以添加到列表中?

Is there a way to define a range of wildcard characters and search or apply a filter?有没有办法定义一系列通配符并搜索或应用过滤器?

example:例子:

from this list:从这个列表:

('C:\Users\ Documents\1H_1P_42497372610000\Kirkbride A1P_42497586550009\Well History.tif',
'C:\Users\ Documents\TEMPORARY\WISE\30497372610000\Kirkbride _42478972610009\ Drilling\Proposals.pdf',
'C:\Users\ Documents\Well History\Drilling\Proposals\Cement\Pilot hole KO plug\ Test Results.txt')

this would be my new list (or dataframe):这将是我的新列表(或数据框):

('C:\Users\ Documents\1H_1P_42497372610000\Kirkbride A1P_42497586550009\Well History.tif',
'C:\Users\ Documents\TEMPORARY\WISE\30497372610000\Kirkbride _42478972610009\ Drilling\Proposals.pdf')

I attempted a few tries with the glob() function and tried to piece together a filter with conditions where I defined a variable 'x' = ('1', '2', '3' . . .) and filtered items where 'x'+'x'+'x'+'x'+'x'+'x'+'x'+'x'+'x'+'x' didn't occur.我尝试使用 glob() 函数进行了几次尝试,并尝试将过滤器与条件拼凑在一起,其中我定义了一个变量'x' = ('1', '2', '3' . . .)和过滤项目 where 'x'+'x'+'x'+'x'+'x'+'x'+'x'+'x'+'x'+'x'没有出现。 I just couldn't come close to piecing together anything that made sense, or that wasn't searching for integers (which won't work).我只是无法将任何有意义的东西拼凑在一起,或者没有搜索整数(这行不通)。

Help me!帮我! Please and thank you!谢谢,麻烦您了!

You can use regex to find strings with 10 consecutive numbers:您可以使用正则表达式查找具有 10 个连续数字的字符串:

In [63]: [i for i in strings if len(re.findall('\d{10}',re.escape(i)))>0]
Out[63]: 
['C:\\Users\\ Documents\\1H_1P_42497372610000\\Kirkbride A1P_42497586550009\\Well History.tif',
 'C:\\Users\\ Documents\\TEMPORARY\\WISE\\30497372610000\\Kirkbride _42478972610009\\ Drilling\\Proposals.pdf']

You might not need the re.escape call, I had to on linux because of the escape characters, which explains the double backslashes '\\\\'.您可能不需要re.escape调用,由于转义字符,我不得不在 linux 上调用,这解释了双反斜杠 '\\\\'。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM