简体   繁体   English

数据类型“国家/地区”无法理解

[英]data type “country” not understood

I am getting the following error for my code: data type "country" not understood. 我的代码收到以下错误:数据类型“国家/地区”无法理解。 I am relatively new to python and am basically trying to learn how to work with .csv files. 我对python相对较新,并且基本上是在尝试学习如何使用.csv文件。 I'm using python 3.4 and editor Canopy. 我正在使用python 3.4和编辑器Canopy。 I was trying to format the data types of the csv into strings and floats, but as soon as i try to assign string type to the first data column (the col is headed by the word - country) i get the error. 我试图将csv的数据类型格式化为字符串和浮点数,但是当我尝试将字符串类型分配给第一个数据列(col由单词-country开头)时,我得到了错误。 I am trying to assign country to "a200" type which is believe can be a string. 我正在尝试将国家/地区分配给“ a200”类型,这可以认为是字符串。 What am i doing wrong here? 我在这里做错了什么? Please be clear as i am new. 请清楚,因为我是新来的。

The code is this: 代码是这样的:

import csv
import numpy 

def open_with_csv(filename):

    data = []

    with open(filename) as csvin:
        file_reader = csv.reader(csvin, delimiter = ',')
        for line in file_reader:
            data.append(line)

    return data

data_from_csv = open_with_csv('C:\Users\user\Desktop\MDR-TB_burden_estimates_2015-05-07.csv')

print (data_from_csv)

FIELDNAMES = ['country', 'iso2', 'iso3', 'iso_numeric', 'year', 'source_mdr_new', 'source_drs_coverage_new', 'source_drs_year_new', 'e_new_mdr_pcnt', 'e_new_mdr_pcnt_lo', 'e_new_mdr_pcnt_hi', 'e_new_mdr_num', 'e_new_mdr_num_lo', 'e_new_mdr_num_hi', 'source_mdr_ret', 'source_drs_coverage_ret', 'source_drs_year_ret', 'e_ret_mdr_pcnt', 'e_ret_mdr_pcnt_lo', 'e_ret_mdr_pcnt_hi', 'e_ret_mdr_num', 'e_ret_mdr_num_lo', 'e_ret_mdr_num_hi', 'e_mdr_num', 'e_mdr_num_lo', 'e_mdr_num_hi']


print (FIELDNAMES)

DATATYPES = [('country','a200'), ('iso2'), ('iso3'), ('iso_numeric'), ('year'), ('source_mdr_new'), ('source_drs_coverage_new'), ('source_drs_year_new'), ('e_new_mdr_pcnt'), ('e_new_mdr_pcnt_lo'), ('e_new_mdr_pcnt_hi'), ('e_new_mdr_num'), ('e_new_mdr_num_lo'), ('e_new_mdr_num_hi'), ('source_mdr_ret'), ('source_drs_coverage_ret'), ('source_drs_year_ret'), ('e_ret_mdr_pcnt'), ('e_ret_mdr_pcnt_lo'), ('e_ret_mdr_pcnt_hi'), ('e_ret_mdr_num'), ('e_ret_mdr_num_lo'), ('e_ret_mdr_num_hi'), ('e_mdr_num'), ('e_mdr_num_lo'), ('e_mdr_num_hi')]

def load_data(filename, d=','):
    my_csv = numpy.genfromtxt(filename, delimiter=d, skip_header=1, invalid_raise=False, names= FIELDNAMES, dtype = DATATYPES)
    return my_csv

my_csv = load_data('C:\Users\user\Desktop\MDR-TB_burden_estimates_2015-05-07.csv')

Looks like the arguments you are passing to numpy.getfromtxt are incorrectly formatted. 您传递给numpy.getfromtxt的参数看起来格式错误。

If you want to pass a value to both names and dtype arguments then you need to specify dtype as a coma separated string: "a200, i4, etc..." 如果要将值同时传递给名称和dtype参数,则需要将dtype指定为逗号分隔的字符串:“ a200,i4等...”

Alternatively you can pass a list of tuple ("name", "type") pairs and not specify names argument. 另外,您可以传递一个元组(“名称”,“类型”)对的列表,而不指定名称参数。

You can look here for examples: http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html 您可以在此处查看示例: http : //docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html

I believe this recreates your problem: 我相信这会重现您的问题:

In [156]: txt=b"""USA, 123, ux345, 1.24
Canada, 434, xz3444, 3.34
France, 443, 2x453, 4.34
"""    
In [157]: FIELDNAMES=['country','id','code','value']
In [158]: DATATYPES =[('country','a100'),('id'),('code'),('value')]
In [159]: np.genfromtxt(txt.splitlines(), delimiter=',',dtype=DATATYPES, names=FIELDNAMES)

...
--> 847         ndtype = np.dtype(dict(formats=ndtype, names=names))
    848     else:
    849         nbtypes = len(ndtype)

TypeError: data type "country" not understood

So that's where your subject line is coming from. 这就是您的主题所在。 For some reason in parsing DATATYPES it thinks 'country' denotes a dtype (on the par with 'S100', 'int', etc). 由于某些原因,在解析DATATYPES它认为'country'表示dtype(与“ S100”,“ int”等同等)。 But you mean it to be a field name. 但您的意思是将其作为字段名称。

Let's correct DATATYPES , and supply a type for each of the fields, not just the first: 让我们更正DATATYPES ,并为每个字段提供类型,而不仅仅是第一个:

In [165]: DATATYPES =[('country','a100'),('id',int),('code','a5'),('value',float)]

In [166]: np.genfromtxt(txt.splitlines(), delimiter=',',dtype=DATATYPES, names=FIELDNAMES)
Out[166]: 
array([(b'USA', 123, b' ux34', 1.24), (b'Canada', 434, b' xz34', 3.34),
       (b'France', 443, b' 2x45', 4.34)], 
      dtype=[('country', 'S100'), ('id', '<i4'), ('code', 'S5'), ('value', '<f8')])

As Serguei writes, there are several ways of specifying the names and dtypes. 正如Serguei所写,有几种指定名称和dtypes的方法。 So yes, reread the genfromtxt docs if this isn't clear. 所以,是的,如果不清楚,请重新genfromtxt文档。 There are also a lot of genfromtxt questions and examples on SO. SO上还有很多genfromtxt问题和示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM