简体   繁体   English

在Python中从文本文件导入数据和变量名称

[英]Importing data and variable names from a text file in Python

I have a text file containing simulation data (60 columns, 100k rows): 我有一个包含模拟数据的文本文件(60列,100k行):

a  b   c  
1  11 111
2  22 222
3  33 333
4  44 444

... where in the first row are variable names, and beneath (in columns) is the corresponding data (float type). ...第一行中的变量名称,下面(列中)是相应的数据(浮点类型)。

I need to use all these variables with their data in Python for further calculations. 我需要将所有这些变量与他们在Python中的数据一起用于进一步的计算。 For example, when I insert: 例如,当我插入时:

print(b)

I need to receive the values from the second column. 我需要从第二列接收值。

I know how to import data: 我知道如何导入数据:

data=np.genfromtxt("1.txt", unpack=True, skiprows = 1)

Assign variables "manually": “手动”分配变量:

a,b,c=np.genfromtxt("1.txt", unpack=True, skiprows = 1)

But I'm having trouble with getting variable names: 但是我在获取变量名时遇到了麻烦:

reader = csv.reader(open("1.txt", "rt"))
for row in reader: 
   list.append(row)
variables=(list[0])  

How can I change this code to get all variable names from the first row and assign them to the imported arrays ? 如何更改此代码以从第一行获取所有变量名称并将它们分配给导入的数组?

The answer is: you don't want to do that . 答案是: 你不想这样做

Dictionaries are designed for exactly this purpose: the data structure you actually want is going to be something like: 字典就是为了这个目的而设计的:你真正想要的数据结构是这样的:

data = {
    "a": [1, 2, 3, 4],
    "b": [11, 22, 33, 44],
    "c": [111, 222, 333, 444],
}

... which you can then easily access using eg data["a"] . ...然后您可以使用例如data["a"]轻松访问。

It's possible to do what you want, but the usual way is a hack which relies on the fact that Python uses (drumroll) a dict internally to store variables - and since your code won't know the names of those variables, you'll be stuck using dictionary access to get at them as well ... so you might as well just use a dictionary in the first place. 可以做你想要的,但通常的方式是一个hack,它依赖于Python在内部使用(鼓励)一个dict存储变量的事实 - 并且由于你的代码不知道这些变量的名称,你将会使用字典访问来阻止它们...所以你可能只是首先使用字典。

It's worth pointing out that this is deliberately made difficult in Python, because if your code doesn't know the names of your variables, they are by definition data rather than logic, and should be treated as such. 值得指出的是,这在Python中是刻意变得困难的,因为如果你的代码不知道变量的名称,那么它们是定义数据而不是逻辑,应该这样对待。

In case you aren't convinced yet, here's a good article on this subject: 如果您还不相信,这里有一篇关于这个主题的好文章:

Stupid Python Ideas: Why you don't want to dynamically create variables 愚蠢的Python想法:为什么你不想动态创建变量

Instead of trying to assign names, you might think about using an associative array , which is known in Python as a dict , to store your variables and their values. 您可以考虑使用关联数组 (在Python中称为dict )来存储变量及其值,而不是尝试分配名称。 The code could then look something like this (borrowing liberally from the csv docs ): 然后代码看起来像这样(从csv docs中大量借用):

import csv
with open('1.txt', 'rt') as f:
  reader = csv.reader(f, delimiter=' ', skipinitialspace=True)

  lineData = list()

  cols = next(reader)
  print(cols)

  for col in cols:
    # Create a list in lineData for each column of data.
    lineData.append(list())


  for line in reader:
    for i in xrange(0, len(lineData)):
      # Copy the data from the line into the correct columns.
      lineData[i].append(line[i])

  data = dict()

  for i in xrange(0, len(cols)):
    # Create each key in the dict with the data in its column.
    data[cols[i]] = lineData[i]

print(data)

data then contains each of your variables, which can be accessed via data['varname'] . 然后, data包含您的每个变量,可以通过data['varname']

So, for example, you could do data['a'] to get the list ['1', '2', '3', '4'] given the input provided in your question. 因此,例如,您可以根据问题中提供的输入执行data['a']以获取列表['1', '2', '3', '4']

I think trying to create names based on data in your document might be a rather awkward way to do this, compared to the dict-based method shown above. 我认为,与上面显示的基于dict的方法相比,尝试基于文档中的数据创建名称可能是一种相当尴尬的方式。 If you really want to do that, though, you might look into reflection in Python (a subject I don't really know anything about). 但是,如果你真的想这样做,你可能会考虑使用Python中的反思 (这个主题我真的不知道)。

Thanks to @andyg0808 and @Zero Piraeus I have found another solution. 感谢@ andyg0808和@Zero Piraeus,我找到了另一个解决方案。 For me, the most appropriate - using Pandas Data Analysis Library. 对我来说,最合适的 - 使用熊猫数据分析库。

   import pandas as pd

   data=pd.read_csv("1.txt",
           delim_whitespace=True,
           skipinitialspace=True)

  result=data["a"]*data["b"]*3
  print(result)

  0     33
  1    132
  2    297
  3    528

...where 0,1,2,3 are the row index. ...其中0,1,2,3是行索引。

Here is a simple way to convert a .txt file of variable names and data to NumPy arrays. 这是将变量名和数据的.txt文件转换为NumPy数组的简单方法。

D = np.genfromtxt('1.txt',dtype='str')    # load the data in as strings
D_data = np.asarray(D[1::,:],dtype=float) # convert the data to floats
D_names = D[0,:]                          # save a list of the variable names

for i in range(len(D_names)):
    key = D_names[i]                      # define the key for this variable 
    val = D_data[:,i]                     # set the value for this variable 
    exec(key + '=val')                    # build the variable  code here

I like this method because it is easy to follow and simple to maintain. 我喜欢这种方法,因为它易于遵循并且易于维护。 We can compact this code as follows: 我们可以按如下方式压缩此代码:

D = np.genfromtxt('1.txt',dtype='str')     # load the data in as strings
for i in range(D.shape[1]):
    val = np.asarray(D[1::,i],dtype=float) # set the value for this variable 
    exec(D[0,i] + '=val')                  # build the variable 

Both codes do the same thing, return NumPy arrays named a,b, and c with their associated data. 两个代码都做同样的事情,返回名为a,b和c的NumPy数组及其关联数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM