简体   繁体   中英

Python: Iterate an operation across different columns of one row for all rows of a graphlab.SFrame

There is a SFrame with columns having dict elements.

import graphlab
import numpy as np
a = graphlab.SFrame({'col1':[{'oshan':3,'modi':4},{'ravi':1,'kishan':5}],
                     'col2':[{'oshan':1,'rawat':2},{'hari':3,'kishan':4}]})

I want to calculate cosine distance between these two columns for each row of the SFrame. Below is the operation using for loop .

dis = np.zeros(len(a),dtype = float)
for i in range(len(a)):
    dis[i] = graphlab.distances.cosine(a['col1'][i],a['col2'][i])

a['distance12'] = dis

This is very inefficient and would take hours if the number of rows was large. Could someone please suggest a better approach.

You can usually avoid looping over an SFrame by using the apply function. In your case, it would look like this:

a.apply(lambda row: graphlab.distances.cosine(row['col1'], row['col2']))

That should be significantly faster than looping in Python.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM