简体   繁体   中英

Using a subset of Pandas dataframe with Scipy Kmeans?

I have a data frame that I import using df = pd.read_csv('my.csv',sep=',') . In that CSV file, the first row is the column name, and the first column is the observation name.

I know how to select a subset of the Panda dataframe, using:

df.iloc[:,1::]

which gives me only the numeric values. But when I try and use this with scipy.cluster.vq.kmeans using this command,

kmeans(df.iloc[:,1::],3)

I get the error 'DataFrame' object has no attribute 'dtype'

Any suggestions?

Here is an example to use KMeans.

from sklearn.datasets import make_blobs
from itertools import product
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans

# try to simulate your data
# =====================================================
X, y = make_blobs(n_samples=1000, n_features=10, centers=3)

columns = ['feature' + str(x) for x in np.arange(1, 11, 1)]
d = {key: values for key, values in zip(columns, X.T)}
d['label'] = y
data = pd.DataFrame(d)

Out[72]: 
     feature1  feature10  feature2  ...    feature8  feature9  label
0      1.2324    -2.6588   -7.2679  ...      5.4166    8.9043      2
1      0.3569    -1.6880   -5.7671  ...     -2.2465   -1.7048      0
2      1.0177    -1.7145   -5.8591  ...     -0.5755   -0.6969      0
3      1.5735    -0.0597   -4.9009  ...      0.3235   -0.2400      0
4     -0.1042    -1.6703   -4.0541  ...      0.4456   -1.0406      0
..        ...        ...       ...  ...         ...       ...    ...
995   -0.0983    -1.4569   -3.5179  ...     -0.3164   -0.6685      0
996    1.3151    -3.3253   -7.0984  ...      3.7563    8.4052      2
997   -0.9177     0.7446   -4.8527  ...     -2.3793   -0.4038      0
998    2.0385    -3.9001   -7.7472  ...      5.2290    9.2281      2
999    3.9357    -7.2564    5.7881  ...      1.2288   -2.2305      1

[1000 rows x 11 columns]

# fit your data with KMeans
# =====================================================

kmeans = KMeans(n_clusters=3)
kmeans.fit_predict(data.ix[:, :-1].values)

Out[70]: array([1, 0, 0, ..., 0, 1, 2], dtype=int32)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM