在 Python 中计算和使用欧几里得距离

Question

I am trying to calculate the Euclidean Distance between two datasets in python.我正在尝试计算 python 中两个数据集之间的欧几里得距离。 I can do this using the following:我可以使用以下方法来做到这一点：

np.linalg.norm(df-signal)

With df and signal being my two datasets. df和signal是我的两个数据集。 This returns a single numerical value (ie, 8258155.579535276), which is fine.这将返回一个数值（即 8258155.579535276），这很好。 My issue is that I want it to return the difference between each column in the dataset.我的问题是我希望它返回数据集中每一列之间的差异。 Something like this:像这样的东西：

AFNLWGT     4.867376e+10
AGI         3.769233e+09
EMCONTRB    1.202935e+07
FEDTAX      8.095078e+07
PTOTVAL     2.500056e+09
STATETAX    1.007451e+07
TAXINC      2.027124e+09
POTHVAL     1.158428e+08
INTVAL      1.606913e+07
PEARNVAL    2.038357e+09
FICA        1.080950e+07
WSALVAL     1.986075e+09
ERNVAL      1.905109e+09

I'm fairly new to Python so would really appreciate any help possible.我对 Python 相当陌生，所以非常感谢任何可能的帮助。

Answer 1

To have the columnwise norm with column headers you can use pandas.DataFrame.aggregate together with np.linalg.norm :要获得带有列标题的列标准，您可以使用pandas.DataFrame.aggregate和np.linalg.norm ：

import pandas as pd
import numpy as np

norms = (df-signal).aggregate(np.linalg.norm)

Notice that, by default, .aggregate operates along the 0-axis (hence columns).请注意，默认情况下， .aggregate沿 0 轴（因此是列）运行。

However this will be much slower than the numpy implementation:然而，这将比 numpy 实现慢得多：

norms = pd.Series(np.linalg.norm(df.to_numpy()-signal.to_numpy(), axis=0), 
                  index=df.columns)

With test data of size 100x2, the latter is 20x faster.使用大小为 100x2 的测试数据，后者快 20 倍。

在 Python 中计算和使用欧几里得距离

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-04-11 11:47:23

在 Python 中计算和使用欧几里得距离

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-04-11 11:47:23

解决方案1
2 已采纳 2020-04-11 11:47:23