简体   繁体   中英

Calculating and using Euclidean Distance in Python

I am trying to calculate the Euclidean Distance between two datasets in python. I can do this using the following:

np.linalg.norm(df-signal)

With df and signal being my two datasets. This returns a single numerical value (ie, 8258155.579535276), which is fine. My issue is that I want it to return the difference between each column in the dataset. Something like this:

AFNLWGT     4.867376e+10
AGI         3.769233e+09
EMCONTRB    1.202935e+07
FEDTAX      8.095078e+07
PTOTVAL     2.500056e+09
STATETAX    1.007451e+07
TAXINC      2.027124e+09
POTHVAL     1.158428e+08
INTVAL      1.606913e+07
PEARNVAL    2.038357e+09
FICA        1.080950e+07
WSALVAL     1.986075e+09
ERNVAL      1.905109e+09

I'm fairly new to Python so would really appreciate any help possible.

To have the columnwise norm with column headers you can use pandas.DataFrame.aggregate together with np.linalg.norm :

import pandas as pd
import numpy as np

norms = (df-signal).aggregate(np.linalg.norm)

Notice that, by default, .aggregate operates along the 0-axis (hence columns).

However this will be much slower than the numpy implementation:

norms = pd.Series(np.linalg.norm(df.to_numpy()-signal.to_numpy(), axis=0), 
                  index=df.columns)

With test data of size 100x2, the latter is 20x faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM