将 dataframe 分配给 for 循环外部的变量，或在 Python 的 for 循环内部直接使用它

Question

option 1:选项1：

a = np.unique(df.values)
for i in range():
  if df2.loc[i,'col1'] in a:
    df2.loc[i,'col2'] = 'Ok'
  else:
    df2.loc[i,'col2'] = 'No'

option 2:选项2：

for i in range():
  if df2.loc[i,'col1'] in np.unique(df.values):
    df2.loc[i,'col2'] = 'Ok'
  else:
    df2.loc[i,'col2'] = 'No'

Which is better in terms of memory and speed in Python? memory 和 Python 的速度哪个更好？

Edited for clarity on the operation inside the for loop.为清楚说明 for 循环内的操作而进行了编辑。

Answer 1

In terms of memory, option 2 would be probably be better because you aren't making a new variable.就 memory 而言，选项 2 可能会更好，因为您没有创建新变量。 In terms of speed, there wouldn't be a difference because they df.values and a refer to the same piece of data.就速度而言，不会有区别，因为它们df.values和a指的是同一条数据。 You can see if two variables refer to the same piece of data by using the is keyword: var1 is var2 .您可以使用 is 关键字查看两个变量是否引用同一条数据： var1 is var2 。 However, we don't know what you are doing with the data.但是，我们不知道您对数据做了什么。

Answer 2

Both are inefficient, the second is the worse as you recalculate the unique values at each step.两者都是低效的，当您在每一步重新计算唯一值时，第二个更糟。

Use vectorial code instead:改用矢量代码：

df2['col2'] = df2['col1'].isin(np.unique(df.values)).map({True: 'Ok', False: 'No'})

将 dataframe 分配给 for 循环外部的变量，或在 Python 的 for 循环内部直接使用它

问题描述

2 个解决方案

解决方案1
0 2022-01-18 20:42:22

解决方案2
0 已采纳 2022-01-18 20:59:54

将 dataframe 分配给 for 循环外部的变量，或在 Python 的 for 循环内部直接使用它

问题描述

2 个解决方案

解决方案1 0 2022-01-18 20:42:22

解决方案2 0 已采纳 2022-01-18 20:59:54

解决方案1
0 2022-01-18 20:42:22

解决方案2
0 已采纳 2022-01-18 20:59:54