简体   繁体   中英

convert list of strings from file to list of integers

I have a large file filled with integers separated by white space and comma. I am trying to read in 1KB at a time and convert it into a list of integers.

This code works fine:

with open('test_age.txt', 'r+') as inf:
    with open('test_age_out.txt', 'r+') as outf:
        sorted_list =[]
        a = [x.strip() for x in inf.read(1000).split(',')]
        int_a = map(int, a)
        f = tempfile.TemporaryFile()
        outf_array = sorted(int_a)
        f.write(str(outf_array))
        f.seek(0)
        #etc...

output:

[1, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, etc...

But once I add in a while loop to read the next 1KB:

with open('test_age.txt', 'r+') as inf:
    with open('test_age_out.txt', 'r+') as outf:
        sorted_list =[]
        while True:
            a = [x.strip() for x in inf.read(1000).split(',')]
            int_a = map(int, a)
            if not a:
                break
            f = tempfile.TemporaryFile()
            outf_array = sorted(int_a)
            print outf_array
            f.write(str(outf_array))
            f.seek(0)      

I get the output and a ValueError:

[1, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 
8, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 12, 12, 12,
12, 12, 12, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 15, 15, 16, 17, 18,
19, 19, 20, 20, 20, 20, 21, 21, 22, 22, 22, 23, 23, 24, 24, 24, 24, 25, 
25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 28, 28, 29, 30, 30, 30, 30,
31, 31, 31, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 35, 35,
35, 35, 35, 36, 36, 37, 37, 37, 37, 38, 38, 39, 39, 39, 39, 39, 39, 40,
40, 40, 40, 41, 41, 42, 43, 43, 43, 44, 44, 44, 44, 44, 45, 46, 46, 46,
46, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 49, 49, 49, 50, 50, 50,
50, 50, 50, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 53, 53, 54,
54, 54, 55, 55, 55, 55, 56, 56, 56, 56, 56, 57, 57, 57, 57, 58, 58, 58,
59, 59, 60, 60, 60, 61, 62, 62, 62, 62, 63, 63, 63, 63, 63, 63, 63, 64,
64, 64, 65, 66, 66, 67, 67, 67, 67, 68, 68, 68, 68, 68, 69, 69, 69, 69, 
69, 69, 69, 70, 70, 70, 70, 71, 71, 72, 72, 73, 74, 74, 74, 75, 76, 76,
76, 76, 77, 77, 77, 77, 78, 78, 79, 79, 79, 79, 81, 81, 81, 81, 82, 82, 
82, 82, 82, 83, 83, 83, 83, 84, 85, 85, 85, 85, 86, 86, 86, 87, 87, 87,
87, 87, 87, 88, 88, 88, 88, 88, 88, 88, 89, 89, 89, 89, 90, 90, 90, 91,
91, 91, 91, 91, 91, 91, 92, 92, 93, 93, 93, 94, 94, 94, 94, 95,  95,
96, 96, 96, 97, 97, 98, 99, 100, 100, 100, 100, 100]
[2, 3, 3, 3, 3, 4, 4, 5, 5, 6, 8, 9, 10, 10, 11, 11, 11, 11, 12, 12,12, 
13, 14, 15, 17, 17, 17, 17, 17, 17, 18, 18, 18, 20, 21, 22, 22, 22, 22, 
23, 23, 24, 24, 24, 26, 27, 27, 27, 27, 28, 28, 29, 29, 29, 29, 30, 32, 
32, 32, 32, 33, 33, 34, 34, 36, 37, 37, 37, 37, 38, 39, 41, 41, 42, 43,   
44, 44, 46, 46, 47, 48, 49, 49, 49, 49, 51, 51, 52, 52, 52, 52, 53, 54, 
54, 54, 55, 55, 56, 60, 60, 61, 61, 61, 62, 63, 63, 64, 65, 65, 65, 65, 
66, 66, 67, 68, 68, 68, 70, 70, 73, 73, 73, 74, 74, 75, 75, 75, 77, 77, 
77, 77, 78, 78, 78, 78, 79, 80, 81, 81, 82, 82, 83, 83, 83, 83, 84, 84, 
85, 85, 85, 85, 86, 87, 88, 90, 91, 91, 91, 92, 93, 93, 93, 94, 95, 97, 
98, 98, 99, 100]
    int_a = map(int, a)
ValueError: invalid literal for int() with base 10: ''

I am not sure why this is happening. If I call print, it seems as if the lists ARE being created and sorted. However the ValueError exists. What gives?

Look at the output of str.split with a passed delimiter appearing at the head or tail of a string:

>>> ', 3, 5'.split(', ')
['', '3', '5']

That empty string is what your program is trying (and failing) to parse as an integer. ''.strip() doesn't help (and isn't necessary for int() , by the way - it automatically ignores leading and trailing whitespace). I recommend reading blocks that are guaranteed to be full and valid, such as lines. If the file is just one big line, you'll have to do some extra work to save the last characters from a line and move them into the next line's processing. Don't forget to process the remaining characters after the loop.

line = inf.read(1000)
new += line
current, delimiter, new = line.rpartition(', ')
# process current
# continue loop to add more content

If the file can comfortably fit in your system's memory, you could just read the entire file and split it in one go:

numbers = map(int, inf.read().split(', '))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM