Extract part of string according to pattern using regular expression Python

Question

I have a files that follow a specific format which look something like this:

test_0800_20180102_filepath.csv
anotherone_0800_20180101_hello.csv

The numbers in the middle represent timestamps, so I would like to extract that information. I know that there is a specific pattern which will always be _time_date_ , so essentially I want the part of the string that lies between the first and third underscores. I found some examples and somehow similar problems, but I am new to Python and I am having trouble adapting them.

This is what I have implemented thus far:

datetime = re.search(r"\d+_(\d+)_", "test_0800_20180102_filepath.csv")

But the result I get is only the date part:

20180102

But what I actually need is:

0800_20180101

Answer 1

That's quite simple:

match = re.search(r"_((\d+)_(\d+))_", your_string)

print(match.group(1))  # print time_date >> 0800_20180101
print(match.group(2))  # print time >> 0800
print(match.group(3))  # print date >> 20180101

Note that for such tasks the group operator () inside the regexp is really helpful, it allows you to access certain substrings of a bigger pattern without having to match each one individually (which can sometimes be much more ambiguous than matching a larger one).

The order in which you then access the groups is from 1-n_specified , where group 0 is the whole matched pattern. Groups themselves are assigned from left to right, as defined in your pattern.

On a side note, if you have control over it, use unix timestamps so you only have one number defining both date and time universally.

Answer 2

They key here is you want everything between the first and the third underscores on each line, so there is no need to worry about designing a regex to match your time and date pattern.

with open('myfile.txt', 'r') as f:
    for line in f:
        x = '_'.join(line.split('_')[1:3])
        print(x)

The problem with your implementation is that you are only capturing the date part of your pattern. If you want to stick with a regex solution then simply move your parentheses to capture the entire pattern you want:

re.search(r"(\d+_\d+)_", "test_0800_20180102_filepath.csv").group(1)

gives:

'0800_20180102'

Answer 3

This is very easy to do with .split() :

time = filename.split("_")[1]
date = filename.split("_")[2]

Extract part of string according to pattern using regular expression Python

Question

3 answers

solution1
3 2018-01-10 09:51:05

solution2
1 2018-01-10 09:51:30

solution3
-1 2018-01-10 09:52:11

Extract part of string according to pattern using regular expression Python

Question

3 answers

solution1 3 2018-01-10 09:51:05

solution2 1 2018-01-10 09:51:30

solution3 -1 2018-01-10 09:52:11

solution1
3 2018-01-10 09:51:05

solution2
1 2018-01-10 09:51:30

solution3
-1 2018-01-10 09:52:11