For a current project, I am planning to run a scikit-learn Stochastic Graduent Booster algorithm over a CSV set that includes numerical data.
When calling X = Germany.drop('Status', axis='columns')
of the script, I am however receiving an AttributeError: 'numpy.ndarray' object has no attribute 'drop'
.
I assume that this error could be related to the fact that I am converting the CSV data pd.to_numeric
, which possibly also converts the string headers. Is there any smart tweak that can make this run?
The CSV data has the following structure:
And the corresponding code looks like this:
Germany = pd.read_csv('./Germany_filtered.csv', index_col=0)
Germany = Germany.fillna("")
Germany = pd.to_numeric(Germany.columns.str, errors='coerce')
Germany.head()
X = Germany.drop('Status', axis='columns')
y = Germany['Status']
In [167]: df = pd.DataFrame(np.arange(12).reshape(3,4),columns=['a','b','c','d'])
drop
works fine on a dataframe:
In [168]: df.drop('c',axis='columns')
Out[168]:
a b d
0 0 1 3
1 4 5 7
2 8 9 11
to_numeric
produces a numpy array:
In [169]: x = pd.to_numeric(df.columns.str,errors='coerce')
In [170]: x
Out[170]:
array(<pandas.core.strings.StringMethods object at 0x7fef602862b0>,
dtype=object)
In [171]: type(x)
Out[171]: numpy.ndarray
It should have complained about head
, before getting to drop
:
In [172]: x.head()
Traceback (most recent call last):
File "<ipython-input-172-830ed5e65d76>", line 1, in <module>
x.head()
AttributeError: 'numpy.ndarray' object has no attribute 'head'
In [173]: x.drop()
Traceback (most recent call last):
File "<ipython-input-173-6d3a33341569>", line 1, in <module>
x.drop()
AttributeError: 'numpy.ndarray' object has no attribute 'drop'
What does to_numeric
docs say? I haven't worked with, but clearly you don't want to pass it that df.columns.str
object. I haven't worked with this function, but let's try passing it the dataframe:
In [176]: x = pd.to_numeric(df,errors='coerce')
Traceback (most recent call last):
File "<ipython-input-176-d095b0166b8f>", line 1, in <module>
x = pd.to_numeric(df,errors='coerce')
File "/usr/local/lib/python3.6/dist-packages/pandas/core/tools/numeric.py", line 139, in to_numeric
raise TypeError("arg must be a list, tuple, 1-d array, or Series")
TypeError: arg must be a list, tuple, 1-d array, or Series
So let's pass a column/Series:
In [177]: x = pd.to_numeric(df['a'],errors='coerce')
In [178]: x
Out[178]:
0 0
1 4
2 8
Name: a, dtype: int64
the resulting Series
could be assigned back to the dataframe, in the same column or a new one:
In [179]: df['a'] = x
In [180]: df
Out[180]:
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
Now in my example frame there's no need to do this conversion, but it should give you something to work with.
Let's try a real string conversion:
In [195]: df['a'] = ['00','04','LS']
In [196]: df
Out[196]:
a b c d
0 00 1 2 3
1 04 5 6 7
2 LS 9 10 11
The linked answer doesn't help:
In [197]: pd.to_numeric(df.columns.str, errors='coerce')
Out[197]:
array(<pandas.core.strings.StringMethods object at 0x7fef602862b0>,
dtype=object)
But my version does produce a numeric Series:
In [198]: pd.to_numeric(df['a'], errors='coerce')
Out[198]:
0 0.0
1 4.0
2 NaN
Name: a, dtype: float64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.