简体   繁体   English

从python中的文件名字符串中提取数字

[英]Extracting numbers from a filename string in python

I have a number of html files in a directory. 我在目录中有许多html文件。 I am trying to store the filenames in a list so that I can use it later to compare with another list. 我正在尝试将文件名存储在一个列表中,以便以后可以将其与另一个列表进行比较。

Eg: Prod224_0055_00007464_20170930.html is one of the filenames. 例如: Prod224_0055_00007464_20170930.html是文件名之一。 From the filename, I want to extract '00007464' and store this value in a list and repeat the same for all the other files in the directory. 从文件名中,我要提取“ 00007464”并将此值存储在列表中,并对目录中的所有其他文件重复相同的操作。 How do I go about doing this? 我该怎么做呢? I am new to Python and any help would be greatly appreciated! 我是Python的新手,任何帮助将不胜感激!

Please let me know if you need more information to answer the question. 如果您需要更多信息来回答问题,请告诉我。

you may try this (assuming you are in the folder with the files: 您可以尝试这样做(假设您位于包含文件的文件夹中:

import os

num_list = []

r, d, files = os.walk( '.' ).next()
for f in files :
    parts = f.split('_')   # now `parts` contains ['Prod224', '0055', '00007464', '20170930.html']
    print parts[2]         # this outputs '00007464'
    num_list.append( parts[2] )

Assuming you have a certain pattern for your files, you can use a regex: 假设您的文件具有特定模式,则可以使用正则表达式:

>>> import re
>>> s = 'Prod224_0055_00007464_20170930.html'
>>> desired_number = re.findall("\d+", s)[2]
>>> desired_number
'00007464'

Using a regex will help you getting not only that specific number you want, but also other numbers in the file name. 使用正则表达式不仅可以帮助您获取所需的特定编号,还可以帮助您获取文件名中的其他编号。

This will work if the name of your files follow the pattern "[some text][number]_[number]_[desired_number]_[a date].html" . 如果文件名遵循“ [某些文本] [number] _ [number] _ [desired_number] _ [a date] .html”的格式,则此方法有效 After getting the number, I think it will be very simple to use the append method to add that number to any list you want. 得到数字后,我认为使用append方法将该数字添加到所需的任何列表中将非常简单。

Split the filename on underscores and select the third element (index 2). 在下划线上分割文件名,然后选择第三个元素(索引2)。

>>> 'Prod224_0055_00007464_20170930.html'.split('_')[2]
'00007464'

In context that might look like this: 在这样的情况下:

nums = [f.split('_')[2] for f in os.listdir(dir) if f.endswith('.html')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM