简体   繁体   中英

How convert digits dataset of scikit-learn to pandas DataFrame?

I have seen a lot of people convert the classic "iris" dataset of scikit-learn to pandas DataFrame since exploratory data analysis is easier with pandas. But I would like to know if there was any way of converting the "digits" dataset in a similar manner. Using the scikit learn dataset as it is, is a little bit harder for me. Can someone please help me with this issue?

Try:

from sklearn.datasets import load_digits

digits = load_digits()
df = pd.DataFrame(np.column_stack([digits['data'], digits['target']]), columns=digits['feature_names'] + ['target'])

We can load this dataset like the below. (We can read this information at the end of this dataset) this dataset has Number of Instances: 1797 and for each num has Number of Attributes: 64 Or Attribute Information: 8x8 image of integer pixels in the range 0..16

from sklearn.datasets import load_digits
digits = load_digits()
digits

{'data': array([[ 0.,  0.,  5., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ..., 10.,  0.,  0.],
        [ 0.,  0.,  0., ..., 16.,  9.,  0.],
        ...,
        [ 0.,  0.,  1., ...,  6.,  0.,  0.],
        [ 0.,  0.,  2., ..., 12.,  0.,  0.],
        [ 0.,  0., 10., ..., 12.,  1.,  0.]]),
 'target': array([0, 1, 2, ..., 8, 9, 8]),
...
'images': array([[[ 0.,  0.,  5., ...,  1.,  0.,  0.],
         [ 0.,  0., 13., ..., 15.,  5.,  0.],
         [ 0.,  3., 15., ..., 11.,  8.,  0.],
         ...,
         [ 0.,  4., 11., ..., 12.,  7.,  0.],
         [ 0.,  2., 14., ..., 12.,  0.,  0.],
         [ 0.,  0.,  6., ...,  0.,  0.,  0.]],
 ...
}

We can create pandas.dataframe for 64 feature of each image and label like below:

import pandas as pd
df = pd.DataFrame(digits['data'])
df['label'] = digits['target']
print(df)

        0    1     2     3     4     5    6    7    8    9  ...   55   56  \
0     0.0  0.0   5.0  13.0   9.0   1.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   
1     0.0  0.0   0.0  12.0  13.0   5.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   
2     0.0  0.0   0.0   4.0  15.0  12.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   
3     0.0  0.0   7.0  15.0  13.0   1.0  0.0  0.0  0.0  8.0  ...  0.0  0.0   
4     0.0  0.0   0.0   1.0  11.0   0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   
...   ...  ...   ...   ...   ...   ...  ...  ...  ...  ...  ...  ...  ...   
1792  0.0  0.0   4.0  10.0  13.0   6.0  0.0  0.0  0.0  1.0  ...  0.0  0.0   
1793  0.0  0.0   6.0  16.0  13.0  11.0  1.0  0.0  0.0  0.0  ...  0.0  0.0   
1794  0.0  0.0   1.0  11.0  15.0   1.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   
1795  0.0  0.0   2.0  10.0   7.0   0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   
1796  0.0  0.0  10.0  14.0   8.0   1.0  0.0  0.0  0.0  2.0  ...  0.0  0.0   

       57   58    59    60    61   62   63  label  
0     0.0  6.0  13.0  10.0   0.0  0.0  0.0      0  
1     0.0  0.0  11.0  16.0  10.0  0.0  0.0      1  
2     0.0  0.0   3.0  11.0  16.0  9.0  0.0      2  
3     0.0  7.0  13.0  13.0   9.0  0.0  0.0      3  
4     0.0  0.0   2.0  16.0   4.0  0.0  0.0      4  
...   ...  ...   ...   ...   ...  ...  ...    ...  
1792  0.0  2.0  14.0  15.0   9.0  0.0  0.0      9  
1793  0.0  6.0  16.0  14.0   6.0  0.0  0.0      0  
1794  0.0  2.0   9.0  13.0   6.0  0.0  0.0      8  
1795  0.0  5.0  12.0  16.0  12.0  0.0  0.0      9  
1796  1.0  8.0  12.0  14.0  12.0  1.0  0.0      8  

[1797 rows x 65 columns]

We can show multiple images for specific num like the below:

import matplotlib.pyplot as plt
num_for_show = 6
for row in df[df['label'].eq(num_for_show)].values:
    plt.imshow(row[:64].reshape(8,8))
    plt.show()

在此处输入图像描述

在此处输入图像描述


We can show one image from digits['images'] like the below: (Shape of this data is 8x8 and we don't need to reshape(8,8) .)

plt.imshow(digits['images'][10])

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM