简体   繁体   English

在Python中将数据文件列拆分为单独的数组

[英]Splitting data file columns into separate arrays in Python

I'm new to python and have been trying to figure this out all day. 我是python的新手,并且一直试图解决这个问题。 I have a data file laid out as below, 我有一个数据文件,如下所示,

time    I(R_stkb)

Step Information: Temp=0  (Run: 1/11)

0.000000000000000e+000  0.000000e+000

9.999999960041972e-012  8.924141e-012

1.999999992008394e-011  9.623148e-012

3.999999984016789e-011  6.154220e-012

(Note: No empty line between the each data line.) (注意:每条数据线之间没有空行。)

I want to plot the data using matplotlib functions, so I'll need the two separate columns in arrays. 我想使用matplotlib函数绘制数据,所以我需要在数组中使用两个单独的列。

I currently have 我现在有

def plotdata():

Xvals=[], Yvals=[]
i = open(file,'r')

for line in i:
    Xvals,Yvals = line.split(' ', 1)

print Xvals,Yvals

But obviously its completely wrong. 但显然它完全错了。 Can anyone give me a simple answer to this, and with an explanation of what exactly the lines mean would be helpful. 任何人都可以给我一个简单的答案,并解释这些线的确切含义会有所帮助。 Cheers. 干杯。

Edit: The first two lines repeat throughout the file. 编辑:前两行在整个文件中重复。

This is a job for the * operator on the zip method. 这是zip方法的*运算符的作业。

>>> asdf
[[1, 2], [3, 4], [5, 6]]


>>> zip(*asdf)
[(1, 3, 5), (2, 4, 6)]

So in the context of your data it might be something like: 因此,在您的数据的上下文中,它可能是这样的:

handle = open(file,'r')
lines = [line.split() for line in handle if line[:4] not in ('time', 'Step')]
Xvals, Yvals = zip(*lines)

or if your really need to be able to mutate the data afterwards you could just call the list constructor on each tuple: 或者如果你真的需要能够在之后改变数据,你可以在每个元组上调用list构造函数:

Xvals, Yvals = [list(block) for block in zip(*lines)]

One way to do it is: 一种方法是:

Xvals=[]; Yvals=[]
i = open(file,'r')

for line in i:
    x, y = line.split(' ', 1)
    Xvals.append(float(x))
    Yvals.append(float(y))

print Xvals,Yvals

Note the call to the float function, which will change the string you get from the file into a number. 请注意对float函数的调用,它会将您从文件中获得的字符串更改为数字。

This is what numpy.loadtxt is designed for. 这就是numpy.loadtxt的设计目标。 Try: 尝试:

import numpy as np
import matplotlib.pyplot as plt

data = np.loadtxt(file, skiprows = 2) # assuming you have time and step information on 2 separate lines 
                                      # and you do not want to read them
plt.plot(data[:,0], data[:,1])
plt.show()

EDIT: if you have time and step information scattered throughout the file and you want to plot data on every step, there is a possibility of reading all the file to memory (suppose it's small enough), and then split it on time strings: 编辑:如果你有时间和步骤信息分散在整个文件中,并且你想在每一步上绘制数据,有可能将所有文件读取到内存(假设它足够小),然后将其拆分为time字符串:

l = open(fname, 'rb').read()
for chunk in l.split('time'):
    data = np.array([s.split() for s in chunk.split('\n')[2:]][:-1], dtype = np.float)
    plt.plot(data[:,0], data[:,1])
    plt.show()

Or else you could add the # comment sign to the comment lines and use np.loadxt . 否则你可以在添加#注释符号的注释行,并使用np.loadxt

If you want to plot this file with matplotlib, you might want to check out it's plotfile function. 如果你想用matplotlib绘制这个文件,你可能想看一下它的plotfile函数。 See the official documentation here . 请参阅此处的官方文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM