Calculate euclidean distance between mean value and datapoints

Question

I have a dataframe that each line are my samples and columns are my features and i would like to calculate the mean value of my dataframe rows and afterwards the euclidean distance between the dataframe samples to the mean value.

For example:

df = pd.DataFrame(np.random.randn(10, 5), columns=list([1, 2, 3]))

For the given dataframe above, first i would like to compute the mean row value, where for this example will be a (1, 3) mean_array . Next i would like to return the distance between the 10 samples to the mean value in my dataframe which will be a (10, 3) output.

How can i do it, in a simple way?

Answer 1

Use df.mean() to calculate the centroid. Then things are straightforward:

dist_to_centroid = np.sqrt((df - df.mean())**2).sum(axis=1)

Output:

0    1.614658
1    4.234299
2    0.665248
3    3.649749
4    2.828436
5    2.281306
6    3.493792
7    4.165420
8    2.299944
9    2.793936
dtype: float64

Answer 2

I think the answer from @Quang Hoang is incorrect. Their answer provides Manhattan Distance, however OP has asked for Euclidean. Just change the placement of the brackets, as the sqrt needs to encapsulate the sum .

dist_to_centroid = np.sqrt(((df - df.mean())**2).sum(axis=1))

Calculate euclidean distance between mean value and datapoints

Question

2 answers

solution1
0 2020-12-11 18:30:58

solution2
0 2021-02-16 14:30:37

Calculate euclidean distance between mean value and datapoints

Question

2 answers

solution1 0 2020-12-11 18:30:58

solution2 0 2021-02-16 14:30:37

solution1
0 2020-12-11 18:30:58

solution2
0 2021-02-16 14:30:37