简体   繁体   English

如何从带有python中适当标签的文本文件中提取数字

[英]How to extract numbers from a text file with appropriate labels in python

boundary
        layer 2
        datatype 0
        xy  15   525270 8663518   525400 8663518   525400 8664818   525660 8664818
                 525660 8663518   525790 8663518   525790 8664818   526050 8664818
                 526050 8663518   526180 8663518   526180 8665398   525980 8665598
                 525470 8665598   525270 8665398   525270 8663518
        endel

I have coordinates of polygons in this format shown above. 我有上面显示的这种格式的多边形坐标。 Each polygon starts with "boundary" and ends with "endel". 每个多边形以“边界”开始,以“ endel”结束。 I am having trouble extracting the layer number, number of points, and the coordinates into either a numpy array or a pandas dataframe. 我在将层号,点数和坐标提取到numpy数组或pandas数据框中时遇到麻烦。

To be specific to this example, I need the layer number (2), number of points (15), and the xy coordinate pairs. 为了特定于此示例,我需要层号(2),点数(15)和xy坐标对。

with open('source1.txt', encoding="utf-8") as f:
    for line in f:
        line = f.readline()
        srs= line.split("\t")
        print(srs)

Doing this doesnt split the numbers even thoe they are separated by tabs 这样做即使数字被制表符分隔也不会拆分数字

['        layer 255\n']
['        xy   5   0 0   22800000 0   22800000 22800000   0 22800000\n']
['        endel\n']

This is the result i got with that 这是我得到的结果

with open('source1.txt', encoding="utf-8") as f:
    for line in f:
        line = f.readline()
        srs= line.split(" ")
        print(srs)

This isnt what i wanted but i tried that too and yet got a bad split 这不是我想要的,但是我也尝试过,但是分裂不好

['', '', '', '', '', '', '', '', 'layer', '255\n']
['', '', '', '', '', '', '', '', 'xy', '', '', '5', '', '', '0', '0', '', '', '22800000', '0', '', '', '22800000', '22800000', '', '', '0', '22800000\n']
['', '', '', '', '', '', '', '', 'endel\n']

I couldnt go to numpy part as im stuck in processing the string from the file 我无法进入numpy部分,因为我无法处理文件中的字符串

Edited as per request 根据要求编辑

You could use some trivial code such as: 您可以使用一些简单的代码,例如:

res = []
coords = []
xy = False
with open('data.txt') as f:
    for line in f.readlines():
        if 'layer' in line:
            arr = line.split()
            layer = int(arr[-1].strip())
        elif 'xy' in line:
            arr = line.split()
            npoints = int(arr[1])
            coords = arr[2:]
            xy = True
        elif 'endel' in line:
            res.append([layer, npoints, coords[0:npoints]])
            xy = False
            coords = []
        elif xy:
            coords.extend(line.split())
print(res)

Then, you can convert the resulting list to numpy array, or whatever you like, but note that coords are still strings in the code above. 然后,您可以将结果列表转换为numpy数组或任何您喜欢的内容,但是请注意,在上面的代码中,coords仍然是字符串。

You can use a regex to parse that file into blocks of the relevant data then parse each block: 您可以使用正则表达式将该文件解析为相关数据的块,然后解析每个块:

for block in re.findall(r'^boundary([\s\S]+?)endel', f.read()):
    m1=re.search(r'^\s*layer\s+(\d+)', block, re.M)
    m2=re.search(r'^\s*datatype\s+(\d+)', block, re.M)
    m3=re.search(r'^\s*xy\s+(\d+)\s+([\s\d]+)', block, re.M)
    if m1 and m2 and m3:
        layer=int(m1.group(1))
        datatype=int(m2.group(1))
        xy=int(m3.group(1))
        coordinates=[(int(x),int(y)) for x,y in zip(*[iter(m3.group(2).split())]*2)]
    else:
        print "can't parse {}".format(block)  

A variable number of coordinates are supported after the xy and it is trivial to test if the number of coordinates parsed is the number expected with len(coordinates)==xy . xy之后支持可变数量的坐标,这很简单,可以测试解析的坐标数量是否为len(coordinates)==xy期望的数量。

As written, this requires reading the entire file into memory. 按照书面要求,这需要将整个文件读入内存。 If size is an issues, (and it usually is not for small to moderate size files), you can use mmap to make the file appear to be in memory. 如果大小是一个问题,(通常不适用于中小尺寸的文件),则可以使用mmap使文件看起来好像在内存中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从python3中的文本文件中提取和读取数字 - How to extract and read numbers from a text file in python3 Python 从文本文件中提取数字和总数 - Python extract numbers from text file and total Python从文本文件中提取特定数字 - Python to extract specific numbers from text file 如何从文本文件中提取和求和数字 - How to extract and sum numbers from a text file 从以 Python 中的字符串开头的文本文件的特定行中提取数字 - Extract numbers from specific lines of a text file starting with strings in Python 使用 Python,如何从 PDF 中提取文本和图像 + 从 output txt 文件中提取颜色字符串和数字 - Using Python, how to extract text and images from PDF + color strings and numbers from the output txt file Python 从文件中提取数字 - Python Extract Numbers from a file 如何从python中的文本文件的多行提取两个特定数字 - How can I extract two specific numbers from multiple line of a text file in python 如何使用 python 从单个文本文件中提取多个 email id 和电话号码? - How can extract multiple email id's and phone numbers from a single text file with python? 如何使用python从图像中提取文本或数字 - How to extract text or numbers from images using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM