I am working with some datasets that I have imported from EXCEL and converted into lists:
import pandas as pd
import numpy as np
datfrms = []
for i in xls.sheet_names:
df = pd.read_excel(xls, i)
datfrms.append(df)
data_a = []
data_b = []
data_c = []
for dfs in datfrms:
data_a.append(dfs.loc[:,'data_a'])
data_b.append(dfs.loc[:,'data_b'])
data_c.append(dfs.loc[:,'data_c'])
Then, I wanted to do some calculations on the data, so I decided to convert the lists into numpy arrays wile performing some calculations:
a = np.asarray([2 * (a + b) for a, b in zip(data_a, data_b])
b = np.asarray([c / 1000 for c in data_c])
Thus, a
, b
, and c
are now defined as <class 'numpy.ndarray'>
, with shape (13,)
, corresponding to the 13 sheets I imported above. Whenever I want to access the data from the first sheet, I write, for instance, data_a[0]
.
However, an error stating that AttributeError: 'Series' object has no attribute 'sqrt'
appears if I want to perform something like:
d = np.sqrt(a / b)
No error spawns if I manually go write:
d0 = np.sqrt(a[0] / b[0])
...
d12 = np.sqrt(a[12] / b[12])
But if I use the type
function, d0
... d12
are now <class 'pandas.core.series.Series'>
, whereas a[0]
and b[0]
are both <class 'numpy.ndarray'>
.
I wish I could add data, but I am unable to recreate the data format by making synthetic data in Python, which I suspect may be the core of the problem (ie I am doing something wrong in terms of the data format).
user32185 requested the output of a[0]
and b[0]
, respectively:
0 0.883871
1 0.885714
2 0.879378
3 0.865668
4 0.866014
5 0.860657
6 0.866071
7 0.884389
8 0.892339
9 0.892512
10 0.841590
11 0.841014
12 0.882200
13 0.857546
14 0.850576
15 0.853975
16 0.838710
dtype: float64
and
0 3.701151
1 3.701938
2 3.700758
3 3.690926
4 3.685027
5 3.688959
6 3.712556
7 3.786099
8 3.888745
9 3.956389
10 3.799078
11 3.799078
12 3.778627
13 3.669295
14 3.638620
15 3.606371
16 3.547379
Name: b, dtype: float64
Your a
and b
are object dtype arrays. You say
with shape (13,), corresponding to the 13 sheets I imported above
and the error indicates that the elements of the arrays are Series.
type(a[0]) # what is it?
Math on object dtype arrays is hit-or-mis:
In [195]: x = np.array([1.2, 2.3], object)
In [196]: np.sqrt(x)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-196-0b43c7e80401> in <module>()
----> 1 np.sqrt(x)
AttributeError: 'float' object has no attribute 'sqrt'
In [197]: (x+x)/2
Out[197]: array([1.2, 2.3], dtype=object)
It delegates the math to methods of the objects. + and / work because the corresponding methods are defined (for floats in my example, Series in yours). But most classes don't define a sqrt
method, hence the failure.
If your initial dataframes all had the same number of rows, the arrays a
made from them would be 2d numeric dtype. You could do all numpy math on them. But because the dataframes differ, the array made from Series is an object dtype array of Series.
In [201]: df1 = pd.DataFrame(np.arange(12).reshape(4,3))
A 2d numeric array from Series of the same size:
In [204]: x=np.array([df1.loc[:,0], df1.loc[:,1]])
In [205]: x
Out[205]:
array([[ 0, 3, 6, 9],
[ 1, 4, 7, 10]])
In [206]: x.dtype
Out[206]: dtype('int64')
An object array of with different size Series:
In [207]: df2 = pd.DataFrame(np.arange(15).reshape(5,3))
In [208]: x=np.array([df1.loc[:,0], df2.loc[:,0]])
In [210]: type(x[0])
Out[210]: pandas.core.series.Series
Summation on the object array works, but note the dtype
In [212]: x+x
Out[212]:
array([0 0
1 6
2 12
3 18
Name: 0, dtype: int64,
0 0
1 6
2 12
3 18
4 24
Name: 0, dtype: int64], dtype=object)
In [213]: np.sqrt(x)
...
AttributeError: 'Series' object has no attribute 'sqrt'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.