简体   繁体   English

在单独的列表中对 Pandas 数据框列值进行矢量化查找

[英]Vectorized look-up of Pandas dataframe column values in a separate list

I'm looking for a quick (vectorized) way to perform calculations using the contents of a Pandas dataframe.我正在寻找一种快速(矢量化)方法来使用 Pandas 数据帧的内容执行计算。

My dataframe contains 2 labels for each row and I want to look up values corresponding to each label (from a dictionary / list) and perform a calculation, returning the result to a new column in the dataframe.我的数据框每行包含 2 个标签,我想查找与每个标签对应的值(从字典/列表中)并执行计算,将结果返回到数据框中的新列。

I include my working example below making use of loops.我在下面使用循环包含了我的工作示例。

label1s = np.array(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'], dtype=str)
label2s = np.array(['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'], dtype=str)
data = np.column_stack([label1s, label2s])

label_values = {'A':1, 'B':2, 'C':3}

df = pd.DataFrame(data=data, columns=['Label1', 'Label2'])

new_col = np.zeros_like(label1s, dtype=float)

for index, row in df.iterrows():
    val1 = label_values[row['Label1']]
    val2 = label_values[row['Label2']]
    new_col[index] = val1 - val2

df['result'] = new_col
df

However, for large datasets, the loop is highly undesirable and slow.但是,对于大型数据集,循环非常不受欢迎且速度缓慢。

Is there a way to optimize this please?请问有没有办法优化这个?

I've explored some of the pandas functionality like "Lookup", but this seems to want each sized arrays, whereas in my case, I need to lookup values from a list external and different sized to the dataframe.我已经探索了一些 Pandas 功能,例如“查找”,但这似乎需要每个大小的数组,而在我的情况下,我需要从外部和不同大小的数据帧列表中查找值。

您可以将字典map到所需的列,即

df['result'] = df.Label1.map(label_values) - df.Label2.map(label_values)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM