简体   繁体   中英

How to make lists of integers from a portion of a file with Python?

I have a file which looks like the following:

@ junk
...
@ junk
    1.0  -100.102487081243
    1.1  -100.102497023421
    ...   ...
    3.0  -100.102473082342
&
@ junk
...

I am interested only in the two columns of numbers given between the @ and & characters. These characters may appear anywhere else in the file but never inside the number block.

I want to create two lists , one with the first column and one with the second column.

List1 = [1.0, 1.1,..., 3.0]
List2 = [-100.102487081243, -100.102497023421,..., -100.102473082342]

I've been using shell scripting to prep these files for a simpler Python script which makes lists, however, I'm trying to migrate these processes over to Python for a more consistent application. Any ideas? I have limited experience with Python and file handling.

Edit: I should mention, this number block appears in two places in the file. Both number blocks are identical.

Edit2: A general function would be most satisfactory for this as I will put it into a custom library.

Current Efforts

I currently use a shell script to trim out everything but the number block into two separate columns. From there it is trivial for me to use the following function

def ReadLL(infile):
    List = open(infile).read().splitlines()
    intL = [int(i) for i in List]
    return intL

by calling it from my main

import sys
import eLIBc
infile = sys.argv[1]
sList = eLIBc.ReadLL(infile)

The problem is knowing how to extract the number block from the original file with Python rather than using shell scripting.

You want to loop over the file itself, and set a flag for when you find the first line without a @ character, after which you can start collecting numbers. Break off reading when you find the & character on a line.

def readll(infile):    
    with open(infile) as data:
        floatlist1, floatlist2 = [], []
        reading = False

        for line in data:
            if not reading:
                if '@' not in line:
                    reading = True
                else:
                    continue

            if '&' in line:
                return floatlist1, floatlist2

            numbers = map(float, line.split())
            floatlist1.append(numbers[0])
            floatlist2.append(numbers[1])

So the above:

  • sets 'reading' to False , and only when a line without '@' is found, is that set to True .
  • when 'reading' is True :
    • returns the data read if the line contains &
    • otherwise it's assumed the line contains two float values separated by whitespace, which are added to their respective lists

By returning, the function ends, with the file closed automatically. Only the first block is read, the rest of the file is simply ignored.

Try this out:

with open("i.txt") as fp:
    lines = fp.readlines()
    data = False
    List1 = []
    List2 = []
    for line in lines:
        if line[0] not in ['&', '@']:
            print line
            line = line.split()
            List1.append(line[0])
            List2.append(line[1])
            data = True
        elif data == True:
            break

print List1
print List2

This should give you the first block of numbers.

Input:

@ junk
@ junk
1.0  -100.102487081243
1.1  -100.102497023421
3.0  -100.102473082342
&
@ junk
1.0  -100.102487081243
1.1  -100.102497023421

Output:

['1.0', '1.1', '3.0']
['-100.102487081243', '-100.102497023421', '-100.102473082342']

Update

If you need both blocks, then use this:

with open("i.txt") as fp:
    lines = fp.readlines()
    List1 = []
    List2 = []
    for line in lines:
        if line[0] not in ['&', '@']:
            print line
            line = line.split()
            List1.append(line[0])
            List2.append(line[1])

print List1
print List2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM