简体   繁体   中英

How to speed up my Python apply function across a DataFrame

I have a rather large data set and I am trying to calculate the sentiment across each document. I am using Vader to calculate the sentiment with the following code, but this process takes over 6 hours to run. I am looking for any way to speed up this process.

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

%time full_trans['bsent'] = full_trans['body_text'].apply(lambda row: analyzer.polarity_scores(row))

Any thoughts would be great because looping through the rows like this is terribly inefficient.

As an example, I have run my code on a mini sample of 100 observations. The results from the alternative forms of code are below. My original code is first, the suggested change to a list comprehension is second. It seems strange that there is no increase in performance between the two methods.

transtest = full_transx.copy(deep=True)

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

%time transtest['bsent'] = [analyzer.polarity_scores(row) for row in transtest['body_text']]

%time full_transx['bsent'] = full_transx['body_text'].apply(lambda row: analyzer.polarity_scores(row))

Wall time: 4min 11s

Wall time: 3min 59s

I assume that full_transx['body_text'] is a Series of strings. In that case it is often much more efficient to loop over the underlying numpy array to build a list comprehension:

full_trans['bsent'] = [analyzer.polarity_scores(row) for row in full_trans['body_text'].values]

it is not efficient to loop through numpy arrays. I suggest you to find a way of applying the function onto the array itself. I am not able to try it, but perhaps you can try analyzer.polarity_scores(full_trans['body_text'].values)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM