I have a CSV file with 6 attributes and 1 class which I read with Pandas.
CsvFile = "/path/to/file.csv"
df = pd.read_csv(CsvFile)
First 5 rows of my CSV:
x,y,x1,y1,x2,y2,class
92,115,120,94,84,102,3
84,102,106,79,84,102,3
84,102,102,83,80,102,3
80,102,102,79,84,94,3
84,94,102,79,80,94,3
Since I have 6 attributes, I want to create a dictionary in Python (6 keys, 5 values each key) which will have the centroids for kmeans.
numberOfClusters = 5
centroids =
{
i+1: [random.uniform(0.0, 255.0), random.uniform(0.0, 255.0),
random.uniform(0.0, 255.0), random.uniform(0.0, 255.0),
random.uniform(0.0, 255.0), random.uniform(0.0, 255.0)]
for i in range(numberOfClusters)
}
Question nr.1: as you understand, it's not very productive to copy-paste the random.uniform(0.0, 255.0)
as many times as the random points I want to get in order to match the number of attributes in my CSV file. Any idea how to do that without copy-paste?
In a similar fashion, in the following code I calculate the Euclidean distance.
for i in centroids.keys():
df['distance_from_{}'.format(i)] = (
np.sqrt(
(df['x'] - centroids[i][0]) ** 2
+ (df['y'] - centroids[i][1]) ** 2
+ (df['x.1'] - centroids[i][2]) ** 2
+ (df['y.1'] - centroids[i][3]) ** 2
+ (df['x.2'] - centroids[i][4]) ** 2
+ (df['y.2'] - centroids[i][5]) ** 2
)
)
Question nr.2: if I have more attributes I have to add more df['x'] - centroids[i][0]) ** 2
, whereas delete one or more if I have less. How can I automate this process a bit?
The reason for not using scikit's kmeans is that I want to calculate weights per cluster.
If number of keys is the problem you can use
n=0
with open('filename.csv','r') as f:
l=f.readline().strip()
n=len(l.split(','))
where n holds number of keys
First question: replace your list by
[random.uniform(0.0, 255.0) for x in range(6)]
Second question:
np.sqrt(np.sum(np.pow(df[df.columns[:5]] - centroid[i], 2)) should work.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.