简体   繁体   中英

extracting data from file inbwetween key words

im trying to write a program that extracts data from a text file in between two key words and puts the data into a list of tuples, with the date as a string and the data as an int. I cannot use for loops only while loops

begin step data

2010-01-01,1000

2010-01-02,2000

end step data

needs extracted into this format [('2001-01-01', 12776), ('2001-01-02', 15128)]

I have written this program:

mylist = []

line = open(filename).read()


start = '<begin step data>'
end = '<end step data>'


startpos = line.find(start) + len(start)
endpos = line.find(end, startpos)
data = line[startpos:endpos].strip("")

mylist.append(data.split())

but this puts it in a wrong format: [['2001-01-01,12776', '2001-01-02,15128']]

I think i may have the wrong aproach to this and should be using readlines instead of read

You could use a readlines, but you would end up having to emulate for loops behavior to go through each lines and stuff, which you don't want.

Your problem, however, lies somewhere else : You can't simply strip character lists from a spliting char and expect it to get you the good format.

If you get something like [['2001-01-01,12776', '2001-01-02,15128']] , and since you don't appear to be seeking perfection, you can simply take what you already have and :

  1. Iterate through each string in array[0].
  2. Split those strings in two set of data.
  3. Cast the second data into number.

Using for loops, which you can then convert..

desired_format = [] # Initialize empty result array.
for element in curr_result[0]: 
    element = element.split(',') # Separate values separated by comas.

    # Finally add seeken results as tuples.
    desired_format.append(
        (
            element[0],
            int(element[1]) # Cast second element as integer. 
        )
    )

(Where the middle parenthesis are initalizing a tuple.)

Note that as mentionned, this is a quick and dirty fix for current issue, but there would be much better ways to do it, which you can probably dig out later.

But you won't be stopped here in your bonus course. ^^'

Try using a regular expression with the re module (deeper explanation of the regex here ):

# Find the (date, data) pairs
matches = re.findall('(\d{4}-\d{2}-\d{2}),(\d+)', text)
# Convert the data to an integer
matches = map(lambda m: (m[0], int(m[1])), matches)

If you wanted to, you could even cut it down to one line by using the re.findall call as the second argument to map .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM