从文件构造结构化的numpy数组？

Question

I've been given the task of writing a simple MD-Simulation program in python which does not utilize python builtin types (dict, list) but only numpy arrays. 我的任务是在python中编写一个简单的MD仿真程序，该程序不利用python内置类型（dict，list），而仅利用numpy数组。 From what I understood, this allows the code to be compiled to run faster. 据我了解，这可以使代码编译得更快。 At one point in my code, I want to get the mass of an atom from a dictionary-like object, that can be sliced by the element name like MassDict['N'] = 14.0067 . 在我的代码中的某一点上，我想从类似字典的对象中获取原子的质量，该质量可以用像MassDict['N'] = 14.0067这样的元素名称来切片。

From what I've read, I'd need to use a structured numpy array. 从我的阅读中，我需要使用结构化的numpy数组。 What I want to do right now is open my file with the following form: 我现在想做的是使用以下格式打开文件：

H 1.008
He 4.003
Li 6.941

and then construct a structured numpy array, which can be sliced by using the element names in the first column. 然后构造一个结构化的numpy数组，可以使用第一列中的元素名称对其进行切片。

I tried making two numpy arrays and then concatenating them, but that doesn't seem to be what I need. 我尝试制作两个numpy数组，然后将它们串联，但这似乎不是我所需要的。 My code looks not that great to begin with. 一开始我的代码看起来不太好。 So how do I create a numpy object which can be sliced by string from a text file optimally? 那么，如何创建一个numpy对象，该对象可以通过文本文件中的字符串进行最佳切片？ Here's my code: 这是我的代码：

import numpy as np
import re

def mass_el(file):

    with open(file) as inf:

        for i, line in enumerate(inf):
            pass

        elements = np.empty((i+1),dtype='S2')
        masses = np.empty((i+1),dtype=np.float32)


    with open(file) as inf:

        for i, line in enumerate(inf):
            elements[i] = re.search('[a-zA-Z]+',line).group()
            masses[i] = re.search('\d+[.]\d+',line).group()

    #???


mass_el('elements.txt')

Answer 1

Using numpy 使用numpy

You could use the numpy loadtxt function, which allows you to read and format directly from a file your data, if the data fields in the file are regularly separated (a CSV file). 您可以使用numpy loadtxt函数，如果文件中的数据字段定期分隔（CSV文件），则可以使用该函数直接从文件中读取数据并进行格式化。

m_els = np.loadtxt('elements.txt', dtype={'names':('element', 'mass'), 'formats':('U2', 'f')})

Using your 3-lines file, you get the following m_els array: 使用3行文件，您将获得以下m_els数组：

array([('H', 1.008), ('He', 4.003), ('Li', 6.941)],
    dtype=[('element', '<U2'), ('mass', '<f4')])

Which is what you want I guess. 我想这就是你想要的。 To get an element, say hydrogen, do m_els[0] to get the first tuple. 要获取元素（例如氢），请执行m_els[0]以获取第一个元组。 m_els[0][1] to get the hydrogen mass. m_els[0][1]得到氢质量。

Using pandas 使用熊猫

It's even easier using pandas which is also fast since it's builtd on top of numpy . 使用pandas甚至更容易，因为它是基于numpy构建的，所以速度也很快。

import pandas as pd
m_els = pd.read_csv('elements.txt', sep='\s+', header=None, names=['element', 'mass'])

In this case m_els is a dataframe, where element names are used as indexes: 在这种情况下， m_els是一个数据m_els ，其中元素名称用作索引：

  element   mass
0       H  1.008
1      He  4.003
2      Li  6.941

To get the row corresponding to an element, say hydrogen, do m_els.iloc[0] . 要获取与元素相对应的行（例如氢），请执行m_els.iloc[0] 。 To get the hydrogen mass, do m_els.loc[0, 'mass'] or, using the element name, m_els['mass'].loc[m_els['element'] == 'H'] . 要获取氢质量，请执行m_els.loc[0, 'mass']或使用元素名称m_els['mass'].loc[m_els['element'] == 'H'] 。

从文件构造结构化的numpy数组？

问题描述

1 个解决方案

解决方案1
0 2019-09-04 13:07:39

Using numpy 使用numpy

Using pandas 使用熊猫

从文件构造结构化的numpy数组？

问题描述

1 个解决方案

解决方案1 0 2019-09-04 13:07:39

Using numpy 使用numpy

Using pandas 使用熊猫

解决方案1
0 2019-09-04 13:07:39