简体   繁体   English

无法使用 swifter 并行化 pandas 应用

[英]Not able to parallelize pandas apply using swifter

I am trying to correct an OCR parsed words in a document by passing each word through a custom process which is time complex.我正在尝试通过将每个单词传递给时间复杂的自定义过程来更正文档中的 OCR 解析单词。 The process is my custom business functionality which does looks through various semantics of the word.该过程是我的自定义业务功能,它确实查看了单词的各种语义。

I am trying to speed up the process using swifter.我正在尝试使用 swifter 来加快进程。 I have a 16 core processor and I do not see all the cores being utilized as I see only 1 core is consuming 100% with remaining 15 idle.我有一个 16 核处理器,但我没有看到所有内核都在使用,因为我看到只有 1 个内核正在消耗 100%,其余 15 个空闲。 What is that I am missing?我错过了什么?

I tried different options like below but to no success.我尝试了以下不同的选项,但没有成功。 Can someone point me to what I am missing here?有人可以指出我在这里缺少什么吗? df is a dataframe with each row containing a word. df 是 dataframe ,每行包含一个单词。 correct_ocr_string is a business function that takes string as an input, runs through custom ML model and returns a string.. correct_ocr_string 是一个业务 function ,它以字符串为输入,通过自定义 ML model 运行并返回一个字符串..

df['Corrected'] = df.OCR.swifter .progress_bar(False).apply(lambda x: correct_ocr_string(x))
df['Corrected'] = df.OCR.swifter .progress_bar(False).apply(correct_ocr_string)
v_fnc = np.vectorize(correct_ocr_string)
df['Corrected'] = df.OCR.swifter .progress_bar(False).apply(v_fnc)

I tried pandarallel.parallel_apply also with no success我试过pandarallel.parallel_apply也没有成功

from pandarallel import pandarallel
pandarallel.initialize(nb_workers=multiprocessing.cpu_count())
df['Corrected'] = df.OCR.parallel_apply(correct_ocr_string)

You have to use allow_dask_on_strings(enable=True) :您必须使用allow_dask_on_strings(enable=True)

df.OCR.swifter.allow_dask_on_strings(enable=True).apply(correct_ocr_string)

Is it possible that you use Jupyter Notebook?您可以使用 Jupyter Notebook 吗? Multiprocessing may cause problems there (swifter and pandarallel).多处理可能会导致问题(更快速和 pandarallel)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM