简体   繁体   中英

Organizing and printing certain words from a text file in Python

I'm a bit new to Python and I had a bit of question regarding how you would get it to extract and organize certain words from a text file. So for example, I made a text file to illustrate:

5.8 Sunny 01/23/2016 Seattle Washington
25.7 Cloudy 03/04/2016 Chicago Illinois
7 Snowy 12/20/2016 Tacoma Washington
3 Windy 04/5/2016 Los Angeles California

So let's say in this case I wanted to only print the dates, weather conditions, and the state while ignoring the city and numerical numbers as well as organizing it by state, I was wondering exactly how I would do this.

Personally I was thinking of doing a .split(' ') function though I don't think would work because the last line has 6 words while the others have 5. I was also thinking of maybe making a set in order to organize by state maybe? I'm a still a bit confused about the process. Thank you.

EDIT: This is what I have now. So this does return the specific words I want.

file = open('word.txt')
for line in file:
    weather = line.split(' ')[1]
    date = line.split(' ')[2]
    state = line.split(' ')[-1]


print(weather)
print(date)
print(state)

EDIT 2: This was my attempt at the organization. However, it doesn't quite work.

file = open('word.txt')
    for line in file:
        weather = line.split(' ')[1]
        date = line.split(' ')[2]
        state = line.split(' ')[-1]


        setlist1 = []
        setlist2 = []

        if state == state:
            setlist2.append(state)        
            setlist1.append(date)
            setlist1.append(weather)
            setlist2.append(setlist1)

        print(setlist2)

I would use a regular expression. You can use named regular expressions that will allow you a succinct and clear way to access each group.

Heres an example:

Test.py:

#!/usr/bin/env python3
import re

pattern = '^(?P<value>[0-9\.]+) '
pattern += '(?P<weather>[a-zA-Z]+) '
pattern += '(?P<date>[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}) '
pattern += '(?P<location>[a-zA-Z\ ]+)$'
matches = []
regex =re.compile(pattern)

with open('text', 'r') as fh:
    for line in fh:
        matches.append(regex.match(line))

With the sample data:

$ charlie on macbook in ~
❯❯ cat text
5.8 Sunny 01/23/2016 Seattle Washington
25.7 Cloudy 03/04/2016 Chicago Illinois
7 Snowy 12/20/2016 Tacoma Washington
3 Windy 04/5/2016 Los Angeles California

When run interactively, you can see it matches each test case.

$ charlie on macbook in ~
❯❯ python3 -i test.py
>>> for match in matches:
...   print(match.groups())
...
('5.8', 'Sunny', '01/23/2016', 'Seattle Washington')
('25.7', 'Cloudy', '03/04/2016', 'Chicago Illinois')
('7', 'Snowy', '12/20/2016', 'Tacoma Washington')
('3', 'Windy', '04/5/2016', 'Los Angeles California')
>>>
>>> for group in ('value', 'weather', 'date', 'location'):
...   print('match[{}]: {}'.format(group, matches[0].group(group)))
...
match[value]: 5.8
match[weather]: Sunny
match[date]: 01/23/2016
match[location]: Seattle Washington
>>>
>>> for group in ('value', 'weather', 'date', 'location'):
...   print('match[{}]: {}'.format(group, matches[1].group(group)))
...
match[value]: 25.7
match[weather]: Cloudy
match[date]: 03/04/2016
match[location]: Chicago Illinois
>>>
>>> for group in ('value', 'weather', 'date', 'location'):
...   print('match[{}]: {}'.format(group, matches[2].group(group)))
...
match[value]: 7
match[weather]: Snowy
match[date]: 12/20/2016
match[location]: Tacoma Washington
>>>
>>> for group in ('value', 'weather', 'date', 'location'):
...   print('match[{}]: {}'.format(group, matches[3].group(group)))
...
match[value]: 3
match[weather]: Windy
match[date]: 04/5/2016
match[location]: Los Angeles California
>>>

From here, you can easily organize the data however you'd like. Lets say you want to collect all data from days where it was sunny.

If we add many more lines to the file to give it more data, and add a function that lets us print data by group we can do better analysis:

~/text:

5.8 Sunny 01/23/2016 Seattle Washington
25.7 Cloudy 03/04/2016 Chicago Illinois
7 Snowy 12/20/2016 Tacoma Washington
3 Windy 04/5/2016 Los Angeles California
31.3 Sunny 04/25/2016 Chicago Illinois
1.3 Sunny 04/25/2016 Seattle Washington
13 Sunny 04/25/2016 Indianapolis Indiana
33 Sunny 04/25/2016 Buffalo New York
1.3 Sunny 04/5/2016 Chicago Illinois
3.3 Sunny 04/25/2016 Tacoma Washington
1.2 Sunny 07/5/2016 Madison Wisconsin
31 Sunny 08/25/2016 Milwaukee Wisconsin
35 Sunny 08/29/2016 Chicago Illinois
5.1 Sunny 11/2/2016 Chicago Illinois
4 Sunny 11/6/2016 Sanwich Illinois
9 Sunny 11/16/2016 Portland Oregons
7 Sunny 11/29/2016 Washington DC
3.2 Sunny 12/10/2016 St Louis Missouri
3.5 Sunny 12/25/2016 Flint Michigan
4.7 Sunny 12/29/2016 Detroit Michigan

~/test.py:

#!/usr/bin/env python3
import re

GROUPS = ('value','date','weather','location')

def print_data(matches, group):
    local_groups = list(set(GROUPS) - {group})
    print('Group: {}'.format(group))
    print('-'*80)
    line_structure = '{0:^25}|{1:^25}|{2:^25}'
    for match in matches:
        data = [
            match.group(local_groups[0]),
            match.group(local_groups[1]),
            match.group(local_groups[2])
        ]
        print(line_structure.format(*data))

pattern = '^(?P<value>[0-9\.]+) '
pattern += '(?P<weather>[a-zA-Z]+) '
pattern += '(?P<date>[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}) '
pattern += '(?P<location>[a-zA-Z\ ]+)$'
matches = []
regex = re.compile(pattern)

with open('text', 'r') as fh:
    for line in fh:
        matches.append(regex.match(line))

sunny_matches = []
for match in matches:
    if match.group('weather').lower() == 'sunny':
        sunny_matches.append(match)

print('Printing sunny weather:')
print('{}\n'.format('='*50))
print_data(sunny_matches, 'weather')

If we run this, we get the following output:

Printing sunny weather:
==================================================

Group: weather
--------------------------------------------------------------------------------
       01/23/2016        |   Seattle Washington    |           5.8
       04/25/2016        |    Chicago Illinois     |          31.3
       04/25/2016        |   Seattle Washington    |           1.3
       04/25/2016        |  Indianapolis Indiana   |           13
       04/25/2016        |    Buffalo New York     |           33
        04/5/2016        |    Chicago Illinois     |           1.3
       04/25/2016        |    Tacoma Washington    |           3.3
        07/5/2016        |    Madison Wisconsin    |           1.2
       08/25/2016        |   Milwaukee Wisconsin   |           31
       08/29/2016        |    Chicago Illinois     |           35
        11/2/2016        |    Chicago Illinois     |           5.1
        11/6/2016        |    Sanwich Illinois     |            4
       11/16/2016        |    Portland Oregons     |            9
       11/29/2016        |      Washington DC      |            7
       12/10/2016        |    St Louis Missouri    |           3.2
       12/25/2016        |     Flint Michigan      |           3.5
       12/29/2016        |    Detroit Michigan     |           4.7

Instead of calling split 3 times, call it once and store the result in a variable

file = open('word.txt')
for line in file:
    res = line.split()
    weather = res[1]
    date = res[2]
    state = res[-1]

You were on the right track - it might be easier to organize the data into individual dictionaries.

import operator
get_data = operator.itemgetter(1, 2, -1)
result = []
with open('file.txt') as f:
    for line in f:
        d = {}
        line= line.strip()
        line = line.split()
        weather, date, state = get_data(line.split())
        d['weather'] = weather
        d['date'] = date
        d['state'] = state
        result.append(d)

Or if you want to preserve the city as well, just split each line three times

import operator
get_data = operator.itemgetter(1, 2, -1)
result = []
with open('file.txt') as f:
    for line in f:
        d = {}
        line= line.strip()
        line = line.split(maxsplit = 3)
        weather, date, city = get_data(line)
        d['weather'] = weather
        d['date'] = date
        d['city'] = city
        result.append(d)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM