[英]Why is converting words into singular from plural in a for loop taking so long (Python 3)?
這是我的代碼,用於從CSV文件讀取文本並將一列中的所有單詞從復數形式轉換為單數形式:
import pandas as pd
from textblob import TextBlob as tb
data = pd.read_csv(r'path\to\data.csv')
for i in range(len(data)):
blob = tb(data['word'][i])
singular = blob.words.singularize() # This makes singular a list
data['word'][i] = ''.join(singular) # Converting the list back to a string
但是這段代碼已經運行了幾分鍾(如果我不停止的話,可能還要運行幾個小時?)! 這是為什么? 當我逐個檢查幾個單詞時,轉換立即發生-完全不需要任何時間。 文件中只有1060行(要轉換的字)。
編輯:它在大約10-12分鍾內完成運行。
以下是一些示例數據:
輸入:
word
development
investment
funds
slow
company
commit
pay
claim
finances
customers
claimed
insurance
comment
rapid
bureaucratic
affairs
reports
policyholders
detailed
輸出:
word
development
investment
fund
slow
company
commit
pay
claim
finance
customer
claimed
insurance
comment
rapid
bureaucratic
affair
report
policyholder
detailed
那這個呢?
In [1]: import pandas as pd
In [2]: from textblob import Word
In [3]: s = pd.read_csv('text', squeeze=True, memory_map=True)
In [4]: type(s)
Out[4]: pandas.core.series.Series
In [5]: s = s.apply(lambda w: Word(w).singularize())
In [6]: s
Out[6]:
0 development
1 investment
2 fund
3 slow
4 company
5 commit
6 pay
7 claim
8 finance
9 customer
10 claimed
11 insurance
12 comment
13 rapid
14 bureaucratic
15 affair
16 report
17 policyholder
18 detailed
Name: word, dtype: object
我在這里使用squeeze
來讓read_csv
返回Series而不是DataFrame,因為word文件只有一列。 另外,如果單詞文件很大,則可以使用memory_map
。
您可以使用數據測試性能嗎?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.