简体   繁体   English

Python-从包含分隔符的文本文件中读取经/纬度坐标

[英]Python - read lat/long coordinates from text-file containing separation sign

Originally, the text file were for the use of GMT (The Generic Mapping Tools). 最初,该文本文件用于GMT(通用映射工具)。 The format is similar as following: 格式类似于以下内容:

122   55
122   56
122.5 57
>
123   25.25
123   25.27

where '>' is recognized as a separation sign for different segments of lines. 其中“>”被识别为不同线段的分隔符号。

Now, I'm using Basemap from mpl_toolkits to plot lines on a map. 现在,我正在使用mpl_toolkits Basemap在地图上绘制线条。 All I need is a 2D numpy array to pass coordinates into a function more or less the same as plt.plot or plt.scatter . 我只需要一个2D numpy数组即可将坐标传递到与plt.plotplt.scatter相同的函数中。

Here's a simple solution which I came up with: 这是我想到的一个简单解决方案:

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
from io import StringIO

file = open("latlon.txt", "r")
data = file.read()

data = data.replace('>','nan\tnan') 
# Use nan,nan to isolate different segments of lines in plt.plot
# Also, np.genfromtxt cannot directly read such kinds of data
#  due to inconsistent of columns for certain lines in the file,
#  hence the replacement
line_xy = np.genfromtxt(StringIO(unicode(data)))

(plotting stuffs...)

I found this way kind of tricky and not really like it... Is there any solution which is more commonly, explicitly or formally used for such case? 我发现这种方式有些棘手,但不是很喜欢...是否有更普遍,显式或正式地用于这种情况的解决方案? Any advice is welcome. 欢迎任何建议。

If you have access to the Pandas toolkit, then the pandas.read_csv() function can be used to parse your data, and turn it into an array of floats: 如果您有权使用Pandas工具包,则可以使用pandas.read_csv()函数解析数据,并将其转换为浮点数数组:

import pandas as pd
df = pd.read_csv('data.txt', sep='\s+', header=None)
array = df.apply(pd.to_numeric, args=('coerce',)).values

Here, the read_csv() will treat the columns in your datafile as separated by whitespace, and return a DataFrame of strings. 在这里, read_csv()会将数据文件中的列视为空白,并返回字符串的DataFrame。 The apply() line then converts those into floating-point numbers, coercing invalid entries (like your '>') into NaNs. 然后apply()行将其转换为浮点数,将无效的条目(如“>”)强制转换为NaN。 The .values attribute then extracts the contents of the DataFrame as an ordinary numpy.ndarray . 然后, .values属性将.values的内容提取为普通的numpy.ndarray This gives: 这给出:

array([[ 122.  ,   55.  ],
       [ 122.  ,   56.  ],
       [ 122.5 ,   57.  ],
       [    nan,     nan],
       [ 123.  ,   25.25],
       [ 123.  ,   25.27]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM