简体   繁体   English

将文件中不同列的数据保存到Python 2.7中的变量中

[英]Save data from separate columns in a file into a variable in Python 2.7

So I have a sample data in a file, which is of the arrangement: 因此,我在文件中有一个示例数据,其排列方式为:

  u   v   w   p
 100 200 300 400 
 101 201 301 401
 102 202 302 402
 103 203 303 403 
 104 204 304 404
 105 205 305 405
 106 206 306 406
 107 207 307 407

Now I want to read the 1st column and save it into a list 'u' , 2nd column into a list 'v' and so on for every column till 'p'. 现在,我想阅读第一列并将其保存到列表'u'中,第二列保存到列表'v'中,依此类推,直到每一列直到'p'。 This is what I have so far: 这是我到目前为止的内容:

import numpy as np
u  = []
v  = []
w  = []
p  = []

with open('testdata.dat') as f:
   for line in f:
       for x in line.split():
           u.append([int(x)])
           v.append([int(x)+1])
           w.append([int(x)+2])
           p.append([int(x)+3]) 

print 'u is'
print(u)
print 'v is'
print(v)
print 'w is'
print(w)
print 'p is'
print(p)

I have tried varying the indices, but obviously it is wrong since I get the output 我已经尝试过更改索引,但是显然这是错误的,因为我得到了输出

u is
[[100], [200], [300], [400], [101], [201], [301], [401], [102], [202], [302], 
 [402], [103], [203], [303], [403], [104], [204], [304], [404], [105], [205], 
 [305], [405], [106], [206], [306], [406], [107], [207], [307], [407]]

v is
[[101], [201], [301], [401], [102], [202], [302], [402], [103], [203], [303], 
 [403], [104], [204], [304], [404], [105], [205], [305], [405], [106], [206], 
 [306], [406], [107], [207], [307], [407], [108], [208], [308], [408]]

w is
[[102], [202], [302], [402], [103], [203], [303], [403], [104], [204], [304], 
 [404], [105], [205], [305], [405], [106], [206], [306], [406], [107], [207], 
 [307], [407], [108], [208], [308], [408], [109], [209], [309], [409]]

p is
[[103], [203], [303], [403], [104], [204], [304], [404], [105], [205], [305], 
 [405], [106], [206], [306], [406], [107], [207], [307], [407], [108], [208], 
 [308], [408], [109], [209], [309], [409], [110], [210], [310], [410]]

It just increments the row number by the index and reads the entire row, whereas I want data from every column written to a separate variable,ie corresponding to the names given in the sample data - u = 100 --> 107, v = 200 --> 207 etc. 它只是通过索引增加行号并读取整行,而我希望将每一列中的数据写入一个单独的变量,即与示例数据中给定的名称相对应-u = 100-> 107,v = 200 -> 207等。

Any ideas on how to do this in Python ? 关于如何在Python中执行此操作的任何想法? ( I have to perform this operation on really large datasets in an iterative manner,So a fast and efficient code would be of great benefit) (我必须以迭代方式在非常大的数据集上执行此操作,因此快速高效的代码将大有裨益)

Please change the inner loop: 请更改内部循环:

   for x in line.split():
       u.append([int(x)])
       v.append([int(x)+1])
       w.append([int(x)+2])
       p.append([int(x)+3]) 

to

   x = line.split()
   u.append([int(x[0])])
   v.append([int(x[1])])
   w.append([int(x[2])])
   p.append([int(x[3])])

In your orginal implement, the statements in the loop "for x in line.split():" would be executed for four times (for each column). 在您的原始实现中,循环“ for line.split():中的x”中的语句将执行四次(对于每一列)。

x.append([int(y)+c]) appends a list of one element - int(y)+c x.append([int(y)+c])附加一个元素的列表x.append([int(y)+c]) int(y)+c

you need x.append(int(y)+c) to get list of numbers instead of list of singletons 您需要x.append(int(y)+c)以获得数字列表,而不是单例列表

also here is pretty nice solution 这里也是很好的解决方案

from itertools import izip

a="""1 2 3 4
10 20 30 40"""

lines= ([int(y) for y in x.split()] for x in a.split("\n"))
cols = izip(*lines)

print list(cols)

prints 版画

[(1, 10), (2, 20), (3, 30), (4, 40)]

The a.split("\\n") would in your case be open("data").readlines() or so 在您的情况下, a.split("\\n")将是open("data").readlines()左右

This should give you much better memory performance as you are gonna need to have loaded only one line of the data file in any given time, unless you are gonna continue the computation with turning the generators into list. 这将为您提供更好的内存性能,因为在任何给定时间内您只需要加载一行数据文件,除非您要通过将生成器转换为列表来继续进行计算。

However, I don't know how it will performance CPU-wise but my guesstimate is it might be a bit better or about the same as your original code. 但是,我不知道它将如何在CPU方面发挥作用,但是我猜测是它可能会更好,或者与您的原始代码大致相同。

If you are gonna benchmark this, it would be also interesting to use just lists instead of generators and try it on pypy (because https://bitbucket.org/pypy/pypy/wiki/JitFriendliness see the generators headline) if you can fit it into the memory. 如果您要对此进行基准测试,也可以只使用列表而不是生成器,然后在pypy上尝试一下(因为https://bitbucket.org/pypy/pypy/wiki/JitFriendliness请参见生成器标题),如果合适的话它进入内存。

Considering your data set 考虑您的数据集

  (10**4 * 8 * 12)/1024.0

Assuming your numbers are relatively small and take 12 bytes each ( Python: How much space does each element of a list take? ), that gives me something a little under 1MB of memory to hold all the data at once. 假设您的数字相对较小,每个数字占用12个字节( Python:列表的每个元素需要占用多少空间? ),这给我带来了不到1MB的内存,可以一次容纳所有数据。 Which is pretty tiny data set in terms of memory consumption. 就内存消耗而言,这是非常小的数据集。

If I understand it well, by using Python build-in functions zip and map , you only need one line to do that: 如果我很了解,通过使用Python内置函数zipmap ,您只需一行即可:

from itertools import izip

u,v,w,p = izip(*(map(int,line.split()) for line in open('data.txt')))

# Usage (Python3 syntax)
print("u is", list(u))
print("v is", list(v))
print("w is", list(w))
print("p is", list(p))

Producing the following result: 产生以下结果:

u is [100, 101, 102, 103, 104, 105, 106, 107]
v is [200, 201, 202, 203, 204, 205, 206, 207]
w is [300, 301, 302, 303, 304, 305, 306, 307]
p is [400, 401, 402, 403, 404, 405, 406, 407]

Since this is your concern, implicit looping by using zip and map should exhibit better performances that doing it in python (even if loops are really fast). 由于这是您的关注点,因此使用zipmap 隐式循环应表现出比python更好的性能(即使循环确实非常快)。 I'm not sure this solution has better memory footprint thought... 我不确定此解决方案是否具有更好的内存占用量...

EDIT: replaced zip by izip to use a generator even on python 2.x 编辑:izip替换了zip ,甚至在python 2.x上也使用了生成器

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Python zip将数据保存在二进制文件的不同列中 - Use Python zip to save data in separate columns from a binary file python 2.7 - csv 文件从 2 列中提取数据 - python 2.7 - csv file pull data from 2 columns 在python中读取和保存具有可变列数的数据文件 - Read and save data file with variable number of columns in python Python:循环浏览文件夹并从每个文件的第一个选项卡保存数据并在单独的选项卡上保存到新文件 - Python: Loop through a folder and save data from first tab of each file and save to new file on separate tabs 下载并在python 2.7中保存文件? - Download and Save a file in python 2.7? 使用python 2.7将dbf数据传递到句子并另存为文本文件 - Pass dbf data to sentence and save as text file using Python 2.7 将HTML表数据解析为JSON并保存到Python 2.7中的文本文件中 - Parse HTML table data to JSON and save to text file in Python 2.7 在Python中将数据文件列拆分为单独的数组 - Splitting data file columns into separate arrays in Python 使用Python 2.7读取文本文件中特定行和列的数据 - Reading a specific row & columns of data in a text file using Python 2.7 如何写入 CSV 数据以单独的列结束(当前在 python 2.7 中使用 arcpy) - How do I write to a CSV with the data ending up in separate columns (currently using arcpy in python 2.7)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM