多列表理解 vs 单 for 循环

Question

I am trying to understand the best practices of coding in python.我试图了解在 python 中编码的最佳实践。 I have a pandas dataframe for which I need to work on the columns that contains strings or floats, I am doing basic data management and I was wondering is it possible that a single for loop is faster than many list comprehensions.我有一个 Pandas 数据框，我需要处理包含字符串或浮点数的列，我正在做基本的数据管理，我想知道单个 for 循环是否可能比许多列表理解更快。

In my case the target dataframe is 4mln or more lines and I'd have like 10 list comprehensions so speed is important and I have to decide whether to write it inside the for loop or many list comprehensions.在我的情况下，目标数据框是 400 万行或更多行，我有 10 个列表理解，所以速度很重要，我必须决定是将它写在 for 循环还是许多列表理解中。 Do you have suggestions?你有什么建议吗？

for i in range(dataframe.shape[0]):
        try: #Price dummy
            if dataframe["Price"].iloc[i]=="0":
                dataframe["Price_Dummy"].iloc[i] = 0
            else:
                dataframe["Price_Dummy"].iloc[i] = 1
        except:
            pass
        try: #Transform everything in MB (middle unit)
            unit_of_measure = dataframe["Size"].iloc[i].split(" ")[-1].lower()
            size = float(dataframe["Size"].iloc[i].split(" ")[0])
            if unit_of_measure =="kb":
                dataframe["Size"].iloc[i] = size/1000
            elif unit_of_measure =="gb":
                dataframe["Size"].iloc[i] = size*1000
            else:
                dataframe["Size"].iloc[i] = size
        except:
            pass

(other 10+ operations) （其他 10+ 项操作）

vs对比

the same in list comprehension列表理解相同

I have found this link: Single list iteration vs multiple list comprehensions我找到了这个链接：单列表迭代 vs 多列表理解

yet this doesn't say whether list comprehensions are always faster independently from the number of iterations considered但这并不能说明列表推导式是否总是更快，而与考虑的迭代次数无关

Answer 1

I would try it without a loop using np.where clauses for the if-elif-else combinations.我会尝试使用np.where子句进行 if-elif-else 组合而不使用循环。 That's usually pretty fast.这通常很快。

import numpy as np

# dataframe is a DataFrame containing data
# Now this:

dataframe["Price"] = np.where(dataframe["Price_Dummy"] == "0",0,1)

# String operations work on whole string columns as well
unit_of_measure = dataframe["Size"].str.split(" ", expand=True)[1].lower()

size = dataframe["Size"].str.split(" ", expand=True)[0].astype("float")

kb_case = np.where(unit_of_measure =="kb", size/1000, size)
dataframe["Size"] = np.where(unit_of_measure =="gb", size*1000, kb_case)

Notice that I replaced the [-1] in the unit_of_measure = line with [1] as the expand=True option does not support the -1 indexing.注意，我取代了[-1]在unit_of_measure =与线[1]作为expand=True选项不支持-1索引。 So you would have to know at which position your unit ends up.所以你必须知道你的单位在哪个位置结束。

Information on splitting strings in DataFrames can be found here .可以在此处找到有关在 DataFrame 中拆分字符串的信息。

In the last two lines, I reproduced the if-elif-else combination which you kind of have to create from the bottom up: Your final result dataframe["Size"] equals size*1000 if the unit is gb .在最后两行中，我复制了您必须自下而上创建的 if-elif-else 组合：如果单位为gb则您的最终结果dataframe["Size"]等于size*1000 。 If not, it equals the kb_case which includes the case where the unit is kb as well as all other cases.如果不是，则等于kb_case ，其中包括单位为kb的情况以及所有其他情况。

多列表理解 vs 单 for 循环

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-11-10 11:40:15

多列表理解 vs 单 for 循环

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-11-10 11:40:15

解决方案1
0 已采纳 2021-11-10 11:40:15