简体   繁体   English

Python Numpy:如何将带有一对逗号分隔浮点数的文本文件转换为多维“ndarray”

[英]Python Numpy : How to convert text file with pair of comma separate floats to multidimensional 'ndarray'

I am new to numpy and i need to convert text file with data我是 numpy 的新手,我需要用数据转换文本文件

219062.60893,395935.54879 219332.52719,395961.82402 219301.47465,395688.32278 219036.33371,395677.57382 218761.63814,395494.84155 219164.12686,395438.70811 219086.49551,395244.03255 218758.05515,395308.52630 219062.60893,395935.54879 219332.52719,395961.82402 219301.47465,395688.32278 219036.33371,395677.57382 218761.63814,395494.84155 219164.12686,395438.70811 219086.49551,395244.03255 218758.05515,395308.52630

to a numpy ndarray of到 numpy ndarray

[[[219062.60893,395935.54879],[219332.52719,395961.82402],[219301.47465,395688.32278],[219036.33371,395677.57382]], [[218761.63814,395494.84155],[219164.12686,395438.70811],[219086.49551,395244.03255],[218758.05515,395308.52630]]] [[[219062.60893,395935.54879],[219332.52719,395961.82402],[219301.47465,395688.32278],[219036.33371,395677.57382]], [[218761.63814,395494.84155],[219164.12686,395438.70811],[219086.49551,395244.03255],[218758.05515,395308.52630 ]]]

what i tried is this我试过的是这个

 textLineArray = np.loadtxt(filePath, str, None, None, None, 0, None, False,0,'bytes',None)

gives me给我

[['219062.60893,395935.54879' '219332.52719,395961.82402'
 '219301.47465,395688.32278' '219036.33371,395677.57382'],
['218761.63814,395494.84155' '219164.12686,395438.70811'
 '219086.49551,395244.03255' '218758.05515,395308.52630']]

and after further spiting with space在进一步吐出空间之后

spaceTextLineArray = np.char.split(textLineArray, ' ', maxsplit=None)

I get this我明白了

[[list(['219062.60893,395935.54879']) list(['219332.52719,395961.82402'])
 list(['219301.47465,395688.32278']) list(['219036.33371,395677.57382'])],[list(['218761.63814,395494.84155']) list(['219164.12686,395438.70811'])
 list(['219086.49551,395244.03255']) list(['218758.05515,395308.52630'])]]

quite their but not exactly their don't know how to get-rid of single quotes相当他们但不完全他们不知道如何摆脱单引号

First solution第一个解决方案

Try this code:试试这个代码:

import numpy as np

data = []
with open('data.txt') as my_file:      
    for line in my_file:  
        data.append([list(map(float ,x.split(','))) for x in line.split(' ')])
arr_data = np.array(data)

and arr_data will contain your numpy array:并且arr_data将包含您的 numpy 数组:

array([[[219062.60893, 395935.54879],
        [219332.52719, 395961.82402],
        [219301.47465, 395688.32278],
        [219036.33371, 395677.57382]],

       [[218761.63814, 395494.84155],
        [219164.12686, 395438.70811],
        [219086.49551, 395244.03255],
        [218758.05515, 395308.5263 ]]])

Brief explanation:简要说明:

  1. Read your file line by line逐行读取文件
  2. Format and store one line data in a list将一行数据格式化并存储在列表中
  3. Convert list to numpy array将列表转换为 numpy 数组

Second solution第二种解决方案

Another solution, without external for-loops, which produces the same results:另一种解决方案,没有外部 for 循环,产生相同的结果:

arr_data = [[list(map(float, a.split(','))) for a in s] for s in np.loadtxt('myData.csv', dtype=str)]

Comparison of Execution Times执行时间比较

I used a file like yours format, with 5000 lines, and the results obtained are the following:我用了一个像你的格式的文件,有5000行,得到的结果如下:

  • First Solution:第一个解决方案:

    # 41.4 ms ± 4.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

  • Second Solution:第二种解决方案:

    # 84.6 ms ± 6.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The first solution I proposed to you seems to be about twice as fast.我向您提出的第一个解决方案似乎快了两倍。


Extra额外的

If instead you have a standard csv format , and you want to upload them directly to a numpy array, you can do so:如果你有一个标准的 csv 格式,并且你想将它们直接上传到 numpy 数组,你可以这样做:

from numpy import genfromtxt
arr_data = genfromtxt('file_data.csv',delimiter=',')

and my_data will contain your numpy array. my_data将包含您的 numpy 数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM