简体   繁体   English

CSV 数据到 Numpy 结构化数组?

[英]CSV data to Numpy structured array?

Name Class Species
a     1      3
b     2      4
c     3      2
a     1      3
b     2      1
c     3      2

This above mentioned data will be from CSV file.上述数据将来自 CSV 文件。 need to convert this to structured array using numpy.需要使用 numpy 将其转换为结构化数组。 need header from the csv become the columns labels for the array.需要来自 csv 的标题成为数组的列标签。

Need to print the mean occurrences of each names in each class (the mean of each species for class 1, class 2, and class 3)需要打印每个类别中每个名称的平均出现次数(类别 1、类别 2 和类别 3 的每个物种的平均值)

I used numpy.genfromtxt() .我使用了numpy.genfromtxt()

This is one way to create a numpy structured array from a csv file:这是从 csv 文件创建numpy结构化数组的一种方法:

import pandas as pd

arr = pd.read_csv('file.csv').to_records(index=False)

# rec.array([('a', 1, 3), ('b', 2, 4), ('c', 3, 2), ('a', 1, 3), ('b', 2, 1),
#            ('c', 3, 2)], 
#           dtype=[('Name', 'O'), ('Class', '<i8'), ('Numbers', '<i8')])

You can then work with numpy or (easier) pandas to perform your calculations.然后,您可以使用numpy或(更简单的) pandas来执行计算。

Using latest numpy (1.14) on Py3.在 Py3 上使用最新的 numpy (1.14)。

Your sample, cleaned up:您的样品,清理:

In [93]: txt = """Name --- Class --- Numbers
    ...: a    ---------- 1    -------- 3
    ...: b    ---------- 2    -------- 4
    ...: c    ---------- 3    -------- 2
    ...: a    ---------- 1    -------- 3
    ...: b    ---------- 2     ------- 1
    ...: c    ---------- 3   --------- 2"""
In [94]: data = np.genfromtxt(txt.splitlines(), dtype=None, names=True, encoding=None)
In [95]: data
Out[95]: 
array([('a', '----------', 1, '--------', 3),
       ('b', '----------', 2, '--------', 4),
       ('c', '----------', 3, '--------', 2),
       ('a', '----------', 1, '--------', 3),
       ('b', '----------', 2, '-------', 1),
       ('c', '----------', 3, '---------', 2)],
      dtype=[('Name', '<U1'), ('f0', '<U10'), ('Class', '<i8'), ('f1', '<U9'), ('Numbers', '<i8')])

Or skipping the dashed columns:或跳过虚线列:

In [96]: data = np.genfromtxt(txt.splitlines(), dtype=None, names=True, encoding=None, usecols=[0,2,4])
In [97]: data
Out[97]: 
array([('a', 1, 3), 
       ('b', 2, 4), 
       ('c', 3, 2), 
       ('a', 1, 3), 
       ('b', 2, 1),
       ('c', 3, 2)],
      dtype=[('Name', '<U1'), ('Class', '<i8'), ('Numbers', '<i8')])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM