如何仅从 python 中的文本文件中获取日期

Question

I have a very big text file and I'm reading it in python.我有一个非常大的文本文件，我正在 python 中阅读它。 I have opened the file in read mode, got data in a variable.我以读取模式打开文件，在变量中获取数据。 Now I want only date from it.现在我只想从中约会。 So I read using readline() function and applied for loop and split the each line result by comma and getting result of index[0].所以我使用 readline() function 阅读并申请循环并用逗号分割每行结果并获得索引[0]的结果。 So that I get a list of dates.这样我就得到了日期列表。 But in text file some of the section is like shown below.但在文本文件中，某些部分如下所示。 Because of this I'm getting 'And bitNumber is 4', 'Then function si', 'Take a char variable' also in my output.因此，在我的 output 中，我也得到了“并且 bitNumber 为 4”、“然后 function si”、“使用 char 变量”。

10/04/2020, 03:05 - ABC: Like if number is 0011 0111
And bitNumber is 4 
Then function si
10/04/2020, 03:08 - ABC: Question 6
Take a char variable, apply a same as number
10/04/2020, 03:08 - ABC: Example If my variable is 0X3C answer should be same

What I do to avoid getting 'And bitNumber is 4', 'Then function si', 'Take a char variable' this in output and should only get dates我做些什么来避免在 output 中得到“并且 bitNumber 是 4”、“然后 function si”、“使用 char 变量”并且应该只获取日期

for row_data in data_collected:
    print(row_data.split(',')[0])

Answer 1

Pass each possible date to datetime.strptime .将每个可能的日期传递给datetime.strptime 。 If it doesn't look like a date this will raise a ValueError .如果它看起来不像日期，则会引发ValueError 。 Assuming all your dates are formatted the same:假设您所有日期的格式都相同：

from datetime import datetime

dates = []
for row in data:
    date = row.split(',', 1)[0]
    try:
        date = datetime.strptime(date, '%m/%d/%Y')
        dates.append(date)
    except ValueError:
        continue

Bonus: now you have datetime.datetime objects instead of just strings.奖励：现在您有了datetime.datetime对象，而不仅仅是字符串。

Answer 2

You can look for dates in that format with regex:您可以使用正则表达式查找该格式的日期：

import re
....
for row_data in data_collected:
    if  re.match(r'\d\d/\d\d/\d\d\d\d',row_data):
        print(row_data.split(',')[0])

that will catch dates in the form nn/nn/nnnn (the \d in a regex means to match any digit)这将以 nn/nn/nnnn 形式捕获日期（正则表达式中的 \d 表示匹配任何数字）

Answer 3

You can use Regular Expression to extract data as below您可以使用正则表达式来提取数据，如下所示

import re
dates = []
with open('sample.txt','r') as f:
    for l in f.readlines():
        match = re.search(r'\d{2}/\d{2}/\d{4}', l)
        if match is not None:
            dates.append(match.group())

Answer 4

This is the most flexible way and it will work on any delimiter.这是最灵活的方式，它适用于任何分隔符。

we will do so.我们会这样做。

import re

result_list = re.findall("(?P<day>0[1-9]|[12][0-9]|3[01])(?P<delimiter>[- /.])(?P<month>0[1-9]|1[012])\2(?P<year>(?:19|20)\d\d)", x)

如何仅从 python 中的文本文件中获取日期

问题描述

4 个解决方案

解决方案1
1 2020-05-13 10:56:58

解决方案2
0 2020-05-13 10:56:40

解决方案3
0 2020-05-13 11:00:15

解决方案4
0 2020-05-13 11:07:02

如何仅从 python 中的文本文件中获取日期

问题描述

4 个解决方案

解决方案1 1 2020-05-13 10:56:58

解决方案2 0 2020-05-13 10:56:40

解决方案3 0 2020-05-13 11:00:15

解决方案4 0 2020-05-13 11:07:02

解决方案1
1 2020-05-13 10:56:58

解决方案2
0 2020-05-13 10:56:40

解决方案3
0 2020-05-13 11:00:15

解决方案4
0 2020-05-13 11:07:02