简体   繁体   中英

How do I get only date from text file in python

I have a very big text file and I'm reading it in python. I have opened the file in read mode, got data in a variable. Now I want only date from it. So I read using readline() function and applied for loop and split the each line result by comma and getting result of index[0]. So that I get a list of dates. But in text file some of the section is like shown below. Because of this I'm getting 'And bitNumber is 4', 'Then function si', 'Take a char variable' also in my output.

10/04/2020, 03:05 - ABC: Like if number is 0011 0111
And bitNumber is 4 
Then function si
10/04/2020, 03:08 - ABC: Question 6
Take a char variable, apply a same as number
10/04/2020, 03:08 - ABC: Example If my variable is 0X3C answer should be same

What I do to avoid getting 'And bitNumber is 4', 'Then function si', 'Take a char variable' this in output and should only get dates

for row_data in data_collected:
    print(row_data.split(',')[0])

Pass each possible date to datetime.strptime . If it doesn't look like a date this will raise a ValueError . Assuming all your dates are formatted the same:

from datetime import datetime

dates = []
for row in data:
    date = row.split(',', 1)[0]
    try:
        date = datetime.strptime(date, '%m/%d/%Y')
        dates.append(date)
    except ValueError:
        continue

Bonus: now you have datetime.datetime objects instead of just strings.

You can look for dates in that format with regex:

import re
....
for row_data in data_collected:
    if  re.match(r'\d\d/\d\d/\d\d\d\d',row_data):
        print(row_data.split(',')[0])

that will catch dates in the form nn/nn/nnnn (the \d in a regex means to match any digit)

You can use Regular Expression to extract data as below

import re
dates = []
with open('sample.txt','r') as f:
    for l in f.readlines():
        match = re.search(r'\d{2}/\d{2}/\d{4}', l)
        if match is not None:
            dates.append(match.group())

This is the most flexible way and it will work on any delimiter.

Your regex "(?P<day>0[1-9]|[12][0-9]|3[01])(?P<delimiter>[- /.])(?P<month>0[1-9]|1[012])\2(?P<year>(?:19|20)\d\d)" let say your data is in string "X"

we will do so.

import re

result_list = re.findall("(?P<day>0[1-9]|[12][0-9]|3[01])(?P<delimiter>[- /.])(?P<month>0[1-9]|1[012])\2(?P<year>(?:19|20)\d\d)", x)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM