简体   繁体   中英

How to convert grouped/binned dataframe to numpy array?

I was wondering how I would be able to convert my binned dataframe to a binned numpy array that I can use in sklearn's PCA.

Here's my code so far (x is my original unbinned dataframe):

bins=(2,6,10,14,20,26,32,38,44,50,56,62,68,74,80,86,92,98)
binned_data = x.groupby(pd.cut(x.Weight, bins))

I want to convert binned_data to a numpy array. Thanks in advance.

EDIT:

When I try binned_data.values, I receive this error:

AttributeError: Cannot access attribute 'values' of 'DataFrameGroupBy' objects, try using the 'apply' method

You need to apply some kind of aggregation to the GroupBy object to return a DataFrame. Once you have that, you can use .values to extract the numpy arrary.

For example, if you wanted the sum or count of the data in each bin you could do:

binned_data.sum().values
binned_data.size().values

Edit: My code wasn't exactly right, because the column (Weight) and the index will have the same name. It can be fixed by renaming the index, as below:

binned_data = x.groupby(pd.cut(x.Weight, bins)).sum()
binned_data.index.name = 'Weight_Bin'
binned_data.reset_index().values

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM