Extracting numbers from a filename string in python

Question

I have a number of html files in a directory. I am trying to store the filenames in a list so that I can use it later to compare with another list.

Eg: Prod224_0055_00007464_20170930.html is one of the filenames. From the filename, I want to extract '00007464' and store this value in a list and repeat the same for all the other files in the directory. How do I go about doing this? I am new to Python and any help would be greatly appreciated!

Please let me know if you need more information to answer the question.

Answer 1

you may try this (assuming you are in the folder with the files:

import os

num_list = []

r, d, files = os.walk( '.' ).next()
for f in files :
    parts = f.split('_')   # now `parts` contains ['Prod224', '0055', '00007464', '20170930.html']
    print parts[2]         # this outputs '00007464'
    num_list.append( parts[2] )

Answer 2

Assuming you have a certain pattern for your files, you can use a regex:

>>> import re
>>> s = 'Prod224_0055_00007464_20170930.html'
>>> desired_number = re.findall("\d+", s)[2]
>>> desired_number
'00007464'

Using a regex will help you getting not only that specific number you want, but also other numbers in the file name.

This will work if the name of your files follow the pattern "[some text][number]_[number]_[desired_number]_[a date].html" . After getting the number, I think it will be very simple to use the append method to add that number to any list you want.

Answer 3

Split the filename on underscores and select the third element (index 2).

>>> 'Prod224_0055_00007464_20170930.html'.split('_')[2]
'00007464'

In context that might look like this:

nums = [f.split('_')[2] for f in os.listdir(dir) if f.endswith('.html')]

Extracting numbers from a filename string in python

Question

3 answers

solution1
0 2019-09-13 16:30:02

solution2
0 2019-09-13 16:30:10

solution3
0 ACCPTED 2019-09-13 16:31:14

Extracting numbers from a filename string in python

Question

3 answers

solution1 0 2019-09-13 16:30:02

solution2 0 2019-09-13 16:30:10

solution3 0 ACCPTED 2019-09-13 16:31:14

solution1
0 2019-09-13 16:30:02

solution2
0 2019-09-13 16:30:10

solution3
0 ACCPTED 2019-09-13 16:31:14