[英]extract df from text file
I have a text file containing x,y,z coordinates of different points.我有一个包含不同点的 x、y、z 坐标的文本文件。 Following is the format of the input file:以下是输入文件的格式:
Input file:
Unnecessay lines...
Unnecessay lines...
...........
...........
ARC
ID 17
ARCELEVATION 0.000000
NODES 1 2
ARCVERTICES 5
234069.60 351451.69 0.0
234067.47 351450.01 0.0
234065.42 351448.26 0.0
234063.32 351446.56 0.0
END
ARC
ID 13
ARCELEVATION 0.000000
NODES 3 4
ARCVERTICES 2
233786.34 351300.85 0.0
233788.61 351301.44 0.0
233790.04 351301.56 0.0
233792.60 351301.99 0.0
Each new line starting with numerical values, after the String 'ARCVERTICES ' contains the x,y,z values of an specific ARC.字符串 'ARCVERTICES ' 之后的每个新行都以数值开头,包含特定 ARC 的 x、y、z 值。 Here I've set set all z values to 0.0.在这里,我将所有 z 值设置为 0.0。 The ID of the specific ARC is give in the line starting with "ID " The digit after 'ARCVERTICES ' indicates the number of points in that specific ARC.特定 ARC 的 ID 在以“ID”开头的行中给出。“ARCVERTICES”后面的数字表示该特定 ARC 中的点数。
I need to get a df like the following:我需要得到一个如下的df:
X Y Z ARC_Vertice ARC
index
1 234,069.60 351,451.69 0.00 5 17
2 234,067.48 351,450.02 0.00 5 17
3 234,065.42 351,448.27 0.00 5 17
4 234,063.32 351,446.56 0.00 5 17
5 233,786.34 351300.85 0 .0 4 13
6 233,788.61 351301.44 0 .0 4 13
Here is the code I came up with which serves my purpose:这是我想出的用于我的目的的代码:
import io
#import re
import pandas as pd
pd.set_option('precision', 9)
def parse_chunk(arc_id, arc_vertice,index,line):
buf = io.StringIO(''.join(line))
chunk = pd.read_csv(buf, sep=" ",names=['X','Y','Z'], header=None,engine='python')
chunk['ARC_Vertice'] = arc_vertice
chunk['ARC'] = arc_id
chunk['index']=index
chunk.set_index('index',inplace=True)
return chunk
with open(r'T:/Active Projects/US Army Corps of Engineers/St Louis District/LD25/Model/Calibration/2020/LDcal20_220127/transect.map','rt+') as f:
for line in f:
if line.startswith("ARC"):
break
chunk_list = []
current_arc_chunk = []
arc_vertice=[]
chunk=[]
chunk_list = []
index = 1
# pattern = re.compile('^[0-9].*',re.IGNORECASE)
for line in f:
if line.startswith("ID"):
arc_id=line.split()[1]
elif line.startswith("ARCVERTICES "):
arc_vertice=line.split()[1]
elif line and line[0].isdigit(): # these lines are xyz values
current_arc_chunk.append(line)
chunk = parse_chunk( arc_id, arc_vertice,index,line)
chunk_list.append(chunk)
index += 1
# elif re.search(pattern,line):
# current_arc_chunk.append(line)
df=pd.concat(chunk_list)
Let me know if there is any other smart concise way to do it, I believe there is.让我知道是否有任何其他智能简洁的方法可以做到这一点,我相信有。 Also in spyder the df shows the X,Y,Z values as rounded integer instead of float.同样在 spyder 中,df 将 X、Y、Z 值显示为四舍五入的 integer 而不是浮点数。 However, calling df.head() shows X,Y,Z values upto 2 decimal places.但是,调用 df.head() 会显示 X、Y、Z 值最多 2 位小数。
Here is the link of a file in google drive: https://drive.google.com/file/d/1lZHWS-ecxeHWBZt2uxZzTwvMFawp-2JK/view?usp=sharing Highly appreciate any new techniques.这是谷歌驱动器中文件的链接: https://drive.google.com/file/d/1lZHWS-ecxeHWBZt2uxZzTwvMFawp-2JK/view?usp=sharing非常感谢任何新技术。
A shorter version:一个较短的版本:
data = []
with open('data.txt') as fp:
is_values = False
for line in fp:
line = line.strip()
if line.startswith('ID'):
ARC = int(line.split()[1])
elif line.startswith('ARCVERTICES'):
ARC_Vertice = int(line.split()[1])
is_values = True
elif line == 'END':
is_values = False
elif is_values:
X, Y, Z = line.split()
data.append({'X': float(X), 'Y': float(Y), 'Z': float(Z),
'ARC_Vertice': ARC_Vertice, 'ARC': ARC})
df = pd.DataFrame(data)
Output: Output:
>>> df
X Y Z ARC_Vertice ARC
0 234069.60 351451.69 0.0 5 17
1 234067.47 351450.01 0.0 5 17
2 234065.42 351448.26 0.0 5 17
3 234063.32 351446.56 0.0 5 17
4 233786.34 351300.85 0.0 2 13
5 233788.61 351301.44 0.0 2 13
6 233790.04 351301.56 0.0 2 13
7 233792.60 351301.99 0.0 2 13
>>> data
[{'X': 234069.6, 'Y': 351451.69, 'Z': 0.0, 'ARC_Vertice': 5, 'ARC': 17},
{'X': 234067.47, 'Y': 351450.01, 'Z': 0.0, 'ARC_Vertice': 5, 'ARC': 17},
{'X': 234065.42, 'Y': 351448.26, 'Z': 0.0, 'ARC_Vertice': 5, 'ARC': 17},
{'X': 234063.32, 'Y': 351446.56, 'Z': 0.0, 'ARC_Vertice': 5, 'ARC': 17},
{'X': 233786.34, 'Y': 351300.85, 'Z': 0.0, 'ARC_Vertice': 2, 'ARC': 13},
{'X': 233788.61, 'Y': 351301.44, 'Z': 0.0, 'ARC_Vertice': 2, 'ARC': 13},
{'X': 233790.04, 'Y': 351301.56, 'Z': 0.0, 'ARC_Vertice': 2, 'ARC': 13},
{'X': 233792.6, 'Y': 351301.99, 'Z': 0.0, 'ARC_Vertice': 2, 'ARC': 13}]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.