简体   繁体   中英

String Parsing - Python

I am working on some assignment which I solved, but I want to ask about a certain scenario. I have a text file, that contains a lot of emails. Some subject lines of emails are written with time and dates as well, while other are written only with email addresses. Example

From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
This is a test email.
From stephen.marquard@uct.ac.za
random text.
From alex.hunt@uct.ac.za
From stephen.marquard@uct.ac.za Sat Jan  6 03:14:16 2008
From qbc@testemail.com

and so on..... I have a task of extracting all the email addresses of the subjects that start with 'From' and have date and time in them. It is simple in the above case, where I can ignore the lines that do not start with 'From' and that do not ends with '2008'. My code for that is below.

fh = open(fname)
for line in fh:
    line = line.rstrip()
    if not line.startswith('From'): continue
    if not line.endswith('2008'):   continue
    words = line.split()
    print words[1]

My question is, what if emails subjects ends with different random years. In that case I can no longer use if not line.endswith('2008'): continue . Can anyone tell me what would be the logic then. Thanks

You can use regex for the check (instead of line: if not line.endswith('2008'): continue).

year = re.search(r'\d{4}$', line)

if year is not None:
    continue

For a more complex parsing you should use the python regular expressions package , re . It is much more powerful (although not always as clear..)

Specifically for your question, you can use something like this:

import re

fh = open(fname)
for line in fh:
    result = re.search(r'^From .* \d{4}$', line)
    if result is not None:
        words = line.split()
        print words[1]

^From - matches all strings which start with 'From'. \\d{4}$ - matches all strings that end with 4 decimal digits. .* - matches any characters in between.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM