简体   繁体   English

将 dataframe 分配给 for 循环外部的变量,或在 Python 的 for 循环内部直接使用它

[英]Assign dataframe to variable outside for loop or use it directly inside for loop in Python

option 1:选项1:

a = np.unique(df.values)
for i in range():
  if df2.loc[i,'col1'] in a:
    df2.loc[i,'col2'] = 'Ok'
  else:
    df2.loc[i,'col2'] = 'No'

option 2:选项2:

for i in range():
  if df2.loc[i,'col1'] in np.unique(df.values):
    df2.loc[i,'col2'] = 'Ok'
  else:
    df2.loc[i,'col2'] = 'No'

Which is better in terms of memory and speed in Python? memory 和 Python 的速度哪个更好?

Edited for clarity on the operation inside the for loop.为清楚说明 for 循环内的操作而进行了编辑。

In terms of memory, option 2 would be probably be better because you aren't making a new variable.就 memory 而言,选项 2 可能会更好,因为您没有创建新变量。 In terms of speed, there wouldn't be a difference because they df.values and a refer to the same piece of data.就速度而言,不会有区别,因为它们df.valuesa指的是同一条数据。 You can see if two variables refer to the same piece of data by using the is keyword: var1 is var2 .您可以使用 is 关键字查看两个变量是否引用同一条数据: var1 is var2 However, we don't know what you are doing with the data.但是,我们不知道您对数据做了什么。

Both are inefficient, the second is the worse as you recalculate the unique values at each step.两者都是低效的,当您在每一步重新计算唯一值时,第二个更糟。

Use vectorial code instead:改用矢量代码:

df2['col2'] = df2['col1'].isin(np.unique(df.values)).map({True: 'Ok', False: 'No'})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM