找到k个连续数字后如何终止字符串？

Question

假设我有一些列表，文件格式为*.1243.* ，我希望获得这4位数字之前的所有内容。 我如何有效地做到这一点？

工作代码的一个丑陋，低效的示例是：

names = []
for file in file_list:
    words = file.split('.')
    for i, word in enumerate(words):
        if word.isdigit():
            if int(word)>999 and int(word)<10000:
                names.append(' '.join(words[:i]))
                break
print(names)

显然，这远非理想，我想知道这样做的更好方法。

Answer 1

您可能要为此使用正则表达式。

import re

name = []
for file in file_list:
    m = re.match(r'^(.+?)\.\d{4}\.', file)
    if m:
        name.append(m.groups()[0])

Answer 2

使用正则表达式，这将变得更简单

import re

names = ['hello.1235.sas','test.5678.hai']

for fn in names:
    myreg = r'(.*)\.(?:\d{4})\..*'
    output = re.findall(myreg,fn)
    print(output)

输出：

['hello']
['test']

Answer 3

如果您知道所有条目都具有相同的格式，则这里是列表理解方法：

[item[0] for item in filter(lambda start, digit, end: len(digit) == 4, (item.split('.') for item in file_list))]

公平地说，我也喜欢@James提供的解决方案。 请注意，此列表理解的缺点是三个循环：1.在要拆分的所有项目上2.筛选匹配的所有项目3.返回结果。

使用常规的for循环可能就足够了：

output = []
for item in file_list:
    begging, digits, end = item.split('.')
    if len(digits) == 4:
        output.append(begging)

它只做一个循环，这样更好。

Answer 4

您可以使用Positive Lookahead (?=(\\.\\d{4}))

import re
pattern=r'(.*)(?=(\.\d{4}))'

text=['*hello.1243.*','*.1243.*','hello.1235.sas','test.5678.hai','a.9999']


print(list(map(lambda x:re.search(pattern,x).group(0),text)))

输出：

['*hello', '*', 'hello', 'test', 'a']

找到k个连续数字后如何终止字符串？

问题描述

4 个解决方案

解决方案1
2 已采纳 2018-01-31 13:36:52

解决方案2
1 2018-01-31 13:40:44

解决方案3
1 2018-01-31 13:44:46

解决方案4
1 2018-01-31 14:24:23

找到k个连续数字后如何终止字符串？

问题描述

4 个解决方案

解决方案1 2 已采纳 2018-01-31 13:36:52

解决方案2 1 2018-01-31 13:40:44

解决方案3 1 2018-01-31 13:44:46

解决方案4 1 2018-01-31 14:24:23

解决方案1
2 已采纳 2018-01-31 13:36:52

解决方案2
1 2018-01-31 13:40:44

解决方案3
1 2018-01-31 13:44:46

解决方案4
1 2018-01-31 14:24:23