简体   繁体   中英

Reading from a text file

I'm not looking for an answer here but rather a guideline of how I should be approaching this task.

I have a txt file that contains the following information:

...
    1947q2        -0.6
    1947q3        -0.3
    1947q4         6.2
    1948q1         6.5
    1948q2         7.6
    1948q3         2.2
    1948q4         0.6
...

My objective is to be able to read the text file based on a keyword selection. For example I want to read the lines that contain only 1947 so the output would be like:

    1947q2        -0.6
    1947q3        -0.3
    1947q4         6.2

Because the numbers are tied to each year, I was thinking of putting each row in a tuple then combining all the tuples into a list. From this list, use regular expressions to search the list to get the tuples that match and print them out accordingly.

Is this an acceptable way to do it? Is there a simpler more obvious solution to this? Not really looking for the optimal method but rather different ideas on how to approach this problem.

import sys
with open('file.txt') as f:
  for line in f:
    if '1947' in line: # or some complex regular expressions test
      sys.stdout.write(line)

You can just parse each line in the body of the for loop and then decide whether to accept it. If you want to get fancy, have a look at map and filter . The with statement ensures that the file is closed afterwards.

One thing you can do is to use generators to filter out members of the list dynamically using a similar method to what you have done already:

data = open("file.txt")
fortysevens = (line for line in data if contains_47(line))
for line in fortysevens:
    # do something here

def contains_47(line):
    # your existing code here to detect if a line contains 47

Is your keyword always going to be the year? If so, I would store them in a dictionary like this:

mydata[year][quarter] = value

So you could get to your example data via mydata['1947'].

To read the file, you might want to use csv.reader, then split the first column on 'q' to get the year and quarter individually.

I'd write code that took all of the lines in the function returned a sequence of tuples like (1947, 3, -7.0). Then its a simple iteration over the result to figure out which ones I really want.

If the data in the lines of the input file are fixed -- as they appear to be -- then something as simple as this would work:

with open('data.txt') as data:
    for line in data:
        if line[4:8] == '1947':
            print line,

# output:
#     1947q2        -0.6
#     1947q3        -0.3
#     1947q4         6.2

Note that the reason I used print line, is because each line string ends with a newline.

As far as I know, regexes have been invented for this kind of job.

A regex will search directly the "lines containing the keyword" . A regex's search can also be based on more complex conditions that will be expressed in a more condensed code than with use of a clumsy "for line in f" loop.

My motto is : "There are no lines" in a text file. It's only a sequence of characters.

What a "for line in f" loop do is to analyse a stream of data to detect the newlines and stop at them: that is a first detection. Then on each line found, one (or more) simple (or complex) condition(s) must be tested on each line detected : that is a second research.

On the other hand, a regex finds directly what is searched, without preliminary searching the newlines. Condition of a line and condition of a keyword in the line are tested at the same time.

import re

keyw = '1947'
pat = re.compile('.*?' + keyw + '.*')

with open('thefile.txt','r') as f:
    keyworded_lines = pat.findall(f.read())

# do what you need with keyworded_lines

Note that in 'r' mode Python transforms all the newlines in '\\n'. Since the point in a RE doesn't match '\\n', the RE needs only '.*' after the keyw.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM