简体   繁体   中英

Pandas DataFrame to Numpy Array ValueError

I am trying to convert a single column of a dataframe to a numpy array. Converting the entire dataframe has no issues.

df

  viz  a1_count  a1_mean     a1_std
0   0         3        2   0.816497
1   1         0      NaN        NaN 
2   0         2       51  50.000000

Both of these functions work fine:

X = df.as_matrix()
X = df.as_matrix(columns=df.columns[1:])

However, when I try:

y = df.as_matrix(columns=df.columns[0])

I get:

TypeError: Index(...) must be called with a collection of some kind, 'viz' was passed

The problem here is that you're passing just a single element which in this case is just the string title of that column, if you convert this to a list with a single element then it works:

In [97]:
y = df.as_matrix(columns=[df.columns[0]])
y

Out[97]:
array([[0],
       [1],
       [0]], dtype=int64)

Here is what you're passing:

In [101]:
df.columns[0]

Out[101]:
'viz'

So it's equivalent to this:

y = df.as_matrix(columns='viz')

which results in the same error

The docs show the expected params:

DataFrame.as_matrix(columns=None) Convert the frame to its Numpy-array representation.

Parameters: columns: list, optional, default:None If None, return all columns, otherwise, returns specified columns

as_matrix expects a list for the columns keyword and df.columns[0] isn't a list. Try df.as_matrix(columns=[df.columns[0]]) instead.

Using the index tolist function works as well

df.as_matrix(columns=df.columns[0].tolist())

When giving multiple columns, for example, the ten first, then the command

df.as_matrix(columns=[df.columns[0:10]])

does not work as it returns an index. However, using

df.as_matrix(columns=df.columns[0:10].tolist())

works well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM