简体   繁体   English

如何使用Python从文件的一部分制作整数列表?

[英]How to make lists of integers from a portion of a file with Python?

I have a file which looks like the following: 我有一个如下文件

@ junk
...
@ junk
    1.0  -100.102487081243
    1.1  -100.102497023421
    ...   ...
    3.0  -100.102473082342
&
@ junk
...

I am interested only in the two columns of numbers given between the @ and & characters. 我只对@&字符之间的两列数字感兴趣。 These characters may appear anywhere else in the file but never inside the number block. 这些字符可能会出现在文件中的其他任何位置,但永远不会出现在数字块中。

I want to create two lists , one with the first column and one with the second column. 我想创建两个列表 ,一个带有第一列,一个带有第二列。

List1 = [1.0, 1.1,..., 3.0]
List2 = [-100.102487081243, -100.102497023421,..., -100.102473082342]

I've been using shell scripting to prep these files for a simpler Python script which makes lists, however, I'm trying to migrate these processes over to Python for a more consistent application. 我一直在使用Shell脚本来为这些文件准备一个更简单的Python脚本,以创建列表,但是,我正在尝试将这些过程迁移到Python上以实现更一致的应用程序。 Any ideas? 有任何想法吗? I have limited experience with Python and file handling. 我对Python和文件处理的经验有限。

Edit: I should mention, this number block appears in two places in the file. 编辑:我应该提到,此数字块出现在文件中的两个位置。 Both number blocks are identical. 两个数字块是相同的。

Edit2: A general function would be most satisfactory for this as I will put it into a custom library. Edit2:通用函数对此最为满意,因为我将其放入自定义库中。

Current Efforts 目前的努力

I currently use a shell script to trim out everything but the number block into two separate columns. 我目前使用shell脚本将数字块以外的所有内容修剪成两列。 From there it is trivial for me to use the following function 从那里开始,使用以下功能对我来说是微不足道的

def ReadLL(infile):
    List = open(infile).read().splitlines()
    intL = [int(i) for i in List]
    return intL

by calling it from my main 通过我的主叫声

import sys
import eLIBc
infile = sys.argv[1]
sList = eLIBc.ReadLL(infile)

The problem is knowing how to extract the number block from the original file with Python rather than using shell scripting. 问题是知道如何使用Python而不是使用Shell脚本从原始文件中提取数字块。

You want to loop over the file itself, and set a flag for when you find the first line without a @ character, after which you can start collecting numbers. 您想遍历文件本身,并为找到没有 @字符的第一行设置一个标志,此后可以开始收集数字。 Break off reading when you find the & character on a line. 在一行上找到&字符时,请中断阅读。

def readll(infile):    
    with open(infile) as data:
        floatlist1, floatlist2 = [], []
        reading = False

        for line in data:
            if not reading:
                if '@' not in line:
                    reading = True
                else:
                    continue

            if '&' in line:
                return floatlist1, floatlist2

            numbers = map(float, line.split())
            floatlist1.append(numbers[0])
            floatlist2.append(numbers[1])

So the above: 所以上面:

  • sets 'reading' to False , and only when a line without '@' is found, is that set to True . 将'reading'设置为False ,并且仅当找到没有'@'的行时才将其设置为True
  • when 'reading' is True : 当'reading'为True
    • returns the data read if the line contains & 如果该行包含&则返回读取的数据
    • otherwise it's assumed the line contains two float values separated by whitespace, which are added to their respective lists 否则,假定该行包含两个用空格分隔的浮点值,并将它们添加到各自的列表中

By returning, the function ends, with the file closed automatically. 通过返回,该函数结束,文件自动关闭。 Only the first block is read, the rest of the file is simply ignored. 仅读取第一个块,而忽略文件的其余部分。

Try this out: 试试看:

with open("i.txt") as fp:
    lines = fp.readlines()
    data = False
    List1 = []
    List2 = []
    for line in lines:
        if line[0] not in ['&', '@']:
            print line
            line = line.split()
            List1.append(line[0])
            List2.append(line[1])
            data = True
        elif data == True:
            break

print List1
print List2

This should give you the first block of numbers. 这应该给您第一组数字。

Input: 输入:

@ junk
@ junk
1.0  -100.102487081243
1.1  -100.102497023421
3.0  -100.102473082342
&
@ junk
1.0  -100.102487081243
1.1  -100.102497023421

Output: 输出:

['1.0', '1.1', '3.0']
['-100.102487081243', '-100.102497023421', '-100.102473082342']

Update 更新资料

If you need both blocks, then use this: 如果您需要两个块,请使用以下命令:

with open("i.txt") as fp:
    lines = fp.readlines()
    List1 = []
    List2 = []
    for line in lines:
        if line[0] not in ['&', '@']:
            print line
            line = line.split()
            List1.append(line[0])
            List2.append(line[1])

print List1
print List2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM