简体   繁体   中英

Calculate euclidean distance between mean value and datapoints

I have a dataframe that each line are my samples and columns are my features and i would like to calculate the mean value of my dataframe rows and afterwards the euclidean distance between the dataframe samples to the mean value.

For example:

df = pd.DataFrame(np.random.randn(10, 5), columns=list([1, 2, 3]))

For the given dataframe above, first i would like to compute the mean row value, where for this example will be a (1, 3) mean_array . Next i would like to return the distance between the 10 samples to the mean value in my dataframe which will be a (10, 3) output.

How can i do it, in a simple way?

Use df.mean() to calculate the centroid. Then things are straightforward:

dist_to_centroid = np.sqrt((df - df.mean())**2).sum(axis=1)

Output:

0    1.614658
1    4.234299
2    0.665248
3    3.649749
4    2.828436
5    2.281306
6    3.493792
7    4.165420
8    2.299944
9    2.793936
dtype: float64

I think the answer from @Quang Hoang is incorrect. Their answer provides Manhattan Distance, however OP has asked for Euclidean. Just change the placement of the brackets, as the sqrt needs to encapsulate the sum .

dist_to_centroid = np.sqrt(((df - df.mean())**2).sum(axis=1))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM