简体   繁体   中英

Extracting a specific PIECE of a line from a textfile(Python)

I have a textfile with the format:

3rd Year:

MECN3010 PREREQ MECN2011 COREQ TIMES1 TIMES2 MO3, MO4, FR5, TH1, TH2

MECN3012 PREREQ MECN2012 COREQ TIMES1 TUA, WE3, TH1, TH2 TIMES2

How can i extract just a particular part of a line?

For eg suppose I want to extract just the

PREREQ MECN 2011

part from the 2nd line.

I'm able to read the particular line I want in, but I don't know how to split / strip just the info I need.

If all the lines you are interested in contain PREREQ MECNYYYY where YYYY is the year number, you can use a regular expression like in the following:

EDIT: corrected the code

import re
# assume that line holds your text line
regex = ur'PREREQ MECN\d{4}'
matcher = re.search(re.compile(regex), line)
    if (matcher):
        match = matcher.group() #gives the actual match

Try this. You can use split and join .

lines = '''3rd Year:
MECN3010 PREREQ MECN2011 COREQ TIMES1 TIMES2 MO3, MO4, FR5, TH1, TH2
MECN3012 PREREQ MECN2012 COREQ TIMES1 TUA, WE3, TH1, TH2 TIMES2'''

for line in lines.splitlines()[1:]:
    print " ".join(line.split()[1:3])

Lets say you've found the line you're interested in:

line = "MECN3010 PREREQ MECN2011 COREQ TIMES1 TIMES2 MO3, MO4, FR5, TH1, TH2"

You have a few ways to extract a given field:

1) Token-based

>>> tokens = line.split()
>>> tokens
['MECN3010', 'PREREQ', 'MECN2011', 'COREQ', 'TIMES1', 'TIMES2', 'MO3,', 'MO4,', 'FR5,', 'TH1,', 'TH2']
>>> tokens[2]
'MECN2011'
>>> tokens[5]
'TIMES2'

Basically, you first split the line into a list of tokens (here done with split() ), then select the one you are interested in with basic list indexing.

If you're interested in multiple tokens, you can slice them out and re-join them:

>>> ' '.join(tokens[1:3])
'PREREQ MECN2011'

2) Position-based

>>> line[16:24]
'MECN2011'
>>> line[38:44]
'TIMES2'

If the parts of the line you are looking for are at a known offset from the beginning of the line, you can use the iterable slicing syntax.

3) Regex

>>> re.search(r'(TIMES\d)', line).groups()
('TIMES1',)
>>> re.findall(r'TIMES\d', line)
['TIMES1', 'TIMES2']

This is a bit more advanced, and full coverage of this is outside the scope, but here's the documentation .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM