[英]numpy - how to add a value to every element in the first column of an array?
I have an array like this: 我有这样一个数组:
array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
('6601', 2.2452745388799898e-27, 0.99999999995270605),
('21801', 1.9849650921836601e-31, 0.99999999997999001), ...,
('45164194', 1.0413482803123399e-24, 0.99999999997453404),
('45164198', 1.09470356446595e-24, 0.99999999997635303),
('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
And I want to turn it into this: (adding a prefix '2R' onto each value in the first column) 我想把它变成这样:(在第一列的每个值上添加前缀'2R')
array([('2R:6506', 4.6725971801473496e-25, 0.99999999995088695),
('2R:6601', 2.2452745388799898e-27, 0.99999999995270605),
('2R:21801', 1.9849650921836601e-31, 0.99999999997999001), ...,
('2R:45164194', 1.0413482803123399e-24, 0.99999999997453404),
('2R:45164198', 1.09470356446595e-24, 0.99999999997635303),
('2R:45164519', 3.7521365799080699e-24, 0.99999999997453404)],
dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
I looked up some stuff about nditer (but I want to support earlier versions of numpy.) Also I'm reading one should avoid iteration. 我查了一些关于nditer的东西(但是我想支持早期版本的numpy。)另外我正在读一个应该避免迭代。
Using numpy.core.defchararray.add
: 使用numpy.core.defchararray.add
:
>>> from numpy import array
>>> from numpy.core.defchararray import add
>>>
>>> xs = array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
... ('6601', 2.2452745388799898e-27, 0.99999999995270605),
... ('21801', 1.9849650921836601e-31, 0.99999999997999001),
... ('45164194', 1.0413482803123399e-24, 0.99999999997453404),
... ('45164198', 1.09470356446595e-24, 0.99999999997635303),
... ('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
... dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
>>> xs['pos'] = add('2R:', xs['pos'])
>>> xs
array([('2R:6506', 4.67259718014735e-25, 0.999999999950887),
('2R:6601', 2.24527453887999e-27, 0.999999999952706),
('2R:21801', 1.98496509218366e-31, 0.99999999997999),
('2R:45164194', 1.04134828031234e-24, 0.999999999974534),
('2R:45164198', 1.09470356446595e-24, 0.999999999976353),
('2R:45164519', 3.75213657990807e-24, 0.999999999974534)],
dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])
A simple (albeit perhaps not optimal) solution is just: 一个简单的(尽管可能不是最优的)解决方案就是:
a = np.array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
('6601', 2.2452745388799898e-27, 0.99999999995270605),
('21801', 1.9849650921836601e-31, 0.99999999997999001),
('45164194', 1.0413482803123399e-24, 0.99999999997453404),
('45164198', 1.09470356446595e-24, 0.99999999997635303),
('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
In [11]: a
Out[11]:
array([('2R:6506', 4.67259718014735e-25, 0.999999999950887),
('2R:6601', 2.24527453887999e-27, 0.999999999952706),
('2R:21801', 1.98496509218366e-31, 0.99999999997999),
('2R:45164194', 1.04134828031234e-24, 0.999999999974534),
('2R:45164198', 1.09470356446595e-24, 0.999999999976353),
('2R:45164519', 3.75213657990807e-24, 0.999999999974534)],
dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])
While I like @falsetru's answer for using core numpy routines, surprisingly, list comprehension seems a bit faster: 虽然我喜欢@fatetru的使用核心numpy例程的答案,但令人惊讶的是,列表理解似乎更快一些:
In [19]: a = np.empty(20000, dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])
In [20]: %timeit a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
100 loops, best of 3: 11.1 ms per loop
In [21]: %timeit a['pos'] = add('2R:', a['pos'])
100 loops, best of 3: 15.7 ms per loop
Definitely benchmark your own use case and hardware to see which makes more sense for your actual application though. 绝对对您自己的用例和硬件进行基准测试,看看哪个对您的实际应用更有意义。 One of the things I've learned is that in certain situations, basic python constructs can outperform numpy built-ins, depending on the task at hand. 我学到的一件事是,在某些情况下,基本的python构造可以胜过numpy内置函数,具体取决于手头的任务。
Another slightly faster solution is to use list comprehension with +
operator. 另一个稍微快一点的解决方案是使用带+
运算符的列表理解。 Though I do not understand why it is faster. 虽然我不明白为什么它更快。 But it is definitely very elegant and basic. 但它绝对是非常优雅和基本的。
a['pos'] = ["2R:" + x for x in a['pos']]
Timings: 时序:
%timeit a['pos'] = ["2R:" + x for x in a['pos']]
8.07 ms ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
9.53 ms ± 391 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit a['pos'] = add('2R:', a['pos'])
14.2 ms ± 337 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
PS: I created the array a
using slightly different definition: PS:我创建数组a
使用稍微不同的定义:
a = np.empty(20000, dtype=[('pos', 'U5'), ('par1', '<f8'), ('par2', '<f8')])
as if I use type Sxxx
for pos
, concatenation produces a type error for me. 就好像我使用类型Sxxx
for pos
,连接会为我产生类型错误。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.