简体   繁体   English

从文本文件中提取df

[英]extract df from text file

I have a text file containing x,y,z coordinates of different points.我有一个包含不同点的 x、y、z 坐标的文本文件。 Following is the format of the input file:以下是输入文件的格式:

Input file:
Unnecessay lines...
Unnecessay lines...
...........
...........
ARC
ID 17
ARCELEVATION 0.000000
NODES        1        2
ARCVERTICES 5
234069.60 351451.69 0.0
234067.47 351450.01 0.0
234065.42 351448.26 0.0
234063.32 351446.56 0.0
END 
ARC
ID 13
ARCELEVATION 0.000000
NODES        3        4
ARCVERTICES 2
233786.34 351300.85 0.0
233788.61 351301.44 0.0
233790.04 351301.56 0.0
233792.60 351301.99 0.0

Each new line starting with numerical values, after the String 'ARCVERTICES ' contains the x,y,z values of an specific ARC.字符串 'ARCVERTICES ' 之后的每个新行都以数值开头,包含特定 ARC 的 x、y、z 值。 Here I've set set all z values to 0.0.在这里,我将所有 z 值设置为 0.0。 The ID of the specific ARC is give in the line starting with "ID " The digit after 'ARCVERTICES ' indicates the number of points in that specific ARC.特定 ARC 的 ID 在以“ID”开头的行中给出。“ARCVERTICES”后面的数字表示该特定 ARC 中的点数。

I need to get a df like the following:我需要得到一个如下的df:

               X          Y    Z  ARC_Vertice ARC
index                                           
1     234,069.60 351,451.69 0.00         5   17
2     234,067.48 351,450.02 0.00         5   17
3     234,065.42 351,448.27 0.00         5   17
4     234,063.32 351,446.56 0.00         5   17
5     233,786.34 351300.85  0 .0         4   13
6    ​ 233,788.61 351301.44  0 .0         4   13

Here is the code I came up with which serves my purpose:这是我想出的用于我的目的的代码:

import io
#import re
import pandas as pd
pd.set_option('precision', 9)
def parse_chunk(arc_id, arc_vertice,index,line):
    buf = io.StringIO(''.join(line))
    chunk = pd.read_csv(buf, sep=" ",names=['X','Y','Z'], header=None,engine='python')
    chunk['ARC_Vertice'] = arc_vertice
    chunk['ARC'] = arc_id
    chunk['index']=index
    chunk.set_index('index',inplace=True)
    return chunk


with open(r'T:/Active Projects/US Army Corps of Engineers/St Louis District/LD25/Model/Calibration/2020/LDcal20_220127/transect.map','rt+') as f:
    for line in f:
        if line.startswith("ARC"):
            break
    chunk_list = []
    current_arc_chunk = []
    arc_vertice=[]
    chunk=[]
    chunk_list = []
    index = 1
#    pattern = re.compile('^[0-9].*',re.IGNORECASE)
    for line in f:
        if line.startswith("ID"):
            arc_id=line.split()[1]
        elif line.startswith("ARCVERTICES "):
            arc_vertice=line.split()[1]
            
        elif line and line[0].isdigit():  # these lines are xyz values
            current_arc_chunk.append(line)
            chunk = parse_chunk( arc_id, arc_vertice,index,line)
            chunk_list.append(chunk)
            index += 1
        # elif re.search(pattern,line):
        #     current_arc_chunk.append(line)
        
        
df=pd.concat(chunk_list) 

Let me know if there is any other smart concise way to do it, I believe there is.让我知道是否有任何其他智能简洁的方法可以做到这一点,我相信有。 Also in spyder the df shows the X,Y,Z values as rounded integer instead of float.同样在 spyder 中,df 将 X、Y、Z 值显示为四舍五入的 integer 而不是浮点数。 However, calling df.head() shows X,Y,Z values upto 2 decimal places.但是,调用 df.head() 会显示 X、Y、Z 值最多 2 位小数。

Here is the link of a file in google drive: https://drive.google.com/file/d/1lZHWS-ecxeHWBZt2uxZzTwvMFawp-2JK/view?usp=sharing Highly appreciate any new techniques.这是谷歌驱动器中文件的链接: https://drive.google.com/file/d/1lZHWS-ecxeHWBZt2uxZzTwvMFawp-2JK/view?usp=sharing非常感谢任何新技术。

A shorter version:一个较短的版本:

data = []
with open('data.txt') as fp:
    is_values = False
    for line in fp:
        line = line.strip()
        if line.startswith('ID'):
            ARC = int(line.split()[1])
        elif line.startswith('ARCVERTICES'):
            ARC_Vertice = int(line.split()[1])
            is_values = True
        elif line == 'END':
            is_values = False
        elif is_values:
            X, Y, Z = line.split()
            data.append({'X': float(X), 'Y': float(Y), 'Z': float(Z),
                         'ARC_Vertice': ARC_Vertice, 'ARC': ARC})

df = pd.DataFrame(data)

Output: Output:

>>> df
           X          Y    Z  ARC_Vertice  ARC
0  234069.60  351451.69  0.0            5   17
1  234067.47  351450.01  0.0            5   17
2  234065.42  351448.26  0.0            5   17
3  234063.32  351446.56  0.0            5   17
4  233786.34  351300.85  0.0            2   13
5  233788.61  351301.44  0.0            2   13
6  233790.04  351301.56  0.0            2   13
7  233792.60  351301.99  0.0            2   13

>>> data
[{'X': 234069.6, 'Y': 351451.69, 'Z': 0.0, 'ARC_Vertice': 5, 'ARC': 17},
 {'X': 234067.47, 'Y': 351450.01, 'Z': 0.0, 'ARC_Vertice': 5, 'ARC': 17},
 {'X': 234065.42, 'Y': 351448.26, 'Z': 0.0, 'ARC_Vertice': 5, 'ARC': 17},
 {'X': 234063.32, 'Y': 351446.56, 'Z': 0.0, 'ARC_Vertice': 5, 'ARC': 17},
 {'X': 233786.34, 'Y': 351300.85, 'Z': 0.0, 'ARC_Vertice': 2, 'ARC': 13},
 {'X': 233788.61, 'Y': 351301.44, 'Z': 0.0, 'ARC_Vertice': 2, 'ARC': 13},
 {'X': 233790.04, 'Y': 351301.56, 'Z': 0.0, 'ARC_Vertice': 2, 'ARC': 13},
 {'X': 233792.6, 'Y': 351301.99, 'Z': 0.0, 'ARC_Vertice': 2, 'ARC': 13}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM