简体   繁体   English

Numpy genfromtxt 遍历列

[英]Numpy genfromtxt iterate over columns

I am using NumPy 's genfromtext to get columns from a CSV file.我正在使用NumPygenfromtext从 CSV 文件中获取列。

Each column needs to be split and assigned to a separate SQLAlchemy SystemRecord combined with some other columns and attributes and added to the DB.每个列都需要拆分并分配给单独的SQLAlchemy SystemRecord并与其他一些列和属性相结合,然后添加到数据库中。

Whats the best practice to iterate over the columns f1 to f9 and add them to the session object?迭代列f1f9并将它们添加到会话对象的最佳实践是什么?

So far, I have used the following code but I don't want to do the same thing for each f column:到目前为止,我已经使用了以下代码,但我不想为每个f列做同样的事情:

t = np.genfromtxt(FILE_NAME,dtype=[(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20), (np.str_, 20), (np.str_, 20),(np.str_, 20)]\
 ,delimiter=',',filling_values="None", skiprows=0,usecols=(0,1,2,3,4,5,6,7,8,9,10))

for r in enumerate(t):
    _acol = r['f1'].split('-')
    _bcol = r['f2'].split('-')
    ....
    arec = t_SystemRecords(first=_acol[0], second=_acol[1], third=_acol[2], ... )
    db.session.add(arec)
    db.session.commit()

Look at t.dtype .看看t.dtype Or the r.dtype .或者r.dtype

Make a sample structured array (which is what genfromtxt returns):制作一个示例结构化数组(这是 genfromtxt 返回的内容):

t = np.ones((5,), dtype='i4,i4,f8,S3')

which looks like:看起来像:

array([(1, 1, 1.0, b'1'), (1, 1, 1.0, b'1'), (1, 1, 1.0, b'1'),
       (1, 1, 1.0, b'1'), (1, 1, 1.0, b'1')], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', 'S3')])

the dtype and dtype.names are:dtypedtype.names是:

In [135]: t.dtype
Out[135]: dtype([('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', 'S3')])

In [138]: t.dtype.names
Out[138]: ('f0', 'f1', 'f2', 'f3')

iterate over the names to see the individual columns:遍历名称以查看各个列:

In [139]: for n in t.dtype.names:
   .....:     print(t[n])
   .....:     
[1 1 1 1 1]
[1 1 1 1 1]
[ 1.  1.  1.  1.  1.]
[b'1' b'1' b'1' b'1' b'1']

Or in your case, iterate over the 'rows', and then iterate over the names:或者在您的情况下,遍历“行”,然后遍历名称:

In [140]: for i,r in enumerate(t):
   .....:     print(r)
   .....:     for n in r.dtype.names:
   .....:         print(r[n])
   .....:         
(1, 1, 1.0, b'1')
1
1
1.0
b'1'
(1, 1, 1.0, b'1')
...

For r , which is 0d (check r.shape ), you can select items by number or iterate对于r ,即 0d (检查r.shape ),您可以按数字或迭代选择项目

r[1]  # == r[r.dtype.names[1]]
for i in r: print(r)

For t which is 1d this does not work;对于t是 1d 这不起作用; t[1] references an item. t[1]引用一个项目。

1d structured arrays behave a bit like 2d arrays, but not quite.一维结构化数组的行为有点像二维数组,但又不完全是。 The usual talk of row and column has to be replaced with row (or item) and field .通常谈论的rowcolumn必须替换为row (或 item)和field


To make a t that might be closer to your case做一个可能更接近你的情况的t

In [175]: txt=[b'one-1, two-23, three-12',b'four-ab, five-ss, six-ss']

In [176]: t=np.genfromtxt(txt,dtype=[(np.str_,20),(np.str_,20),(np.str_,20)])

In [177]: t
Out[177]: 
array([('one-1,', 'two-23,', 'three-12'),
       ('four-ab,', 'five-ss,', 'six-ss')], 
      dtype=[('f0', '<U20'), ('f1', '<U20'), ('f2', '<U20')])

np.char has string functions that can be applied to an array: np.char具有可应用于数组的字符串函数:

In [178]: np.char.split(t['f0'],'-')
Out[178]: array([['one', '1,'], ['four', 'ab,']], dtype=object)

It doesn't work on the structured array, but does work on the individual fields.它不适用于结构化数组,但适用于各个字段。 That output could be indexed as a list of lists (it's not 2d).该输出可以被索引为列表列表(它不是 2d)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM