[英]Assign dataframe to variable outside for loop or use it directly inside for loop in Python
option 1:选项1:
a = np.unique(df.values)
for i in range():
if df2.loc[i,'col1'] in a:
df2.loc[i,'col2'] = 'Ok'
else:
df2.loc[i,'col2'] = 'No'
option 2:选项2:
for i in range():
if df2.loc[i,'col1'] in np.unique(df.values):
df2.loc[i,'col2'] = 'Ok'
else:
df2.loc[i,'col2'] = 'No'
Which is better in terms of memory and speed in Python? memory 和 Python 的速度哪个更好?
Edited for clarity on the operation inside the for loop.为清楚说明 for 循环内的操作而进行了编辑。
In terms of memory, option 2 would be probably be better because you aren't making a new variable.就 memory 而言,选项 2 可能会更好,因为您没有创建新变量。 In terms of speed, there wouldn't be a difference because they df.values
and a
refer to the same piece of data.就速度而言,不会有区别,因为它们df.values
和a
指的是同一条数据。 You can see if two variables refer to the same piece of data by using the is keyword: var1 is var2
.您可以使用 is 关键字查看两个变量是否引用同一条数据: var1 is var2
。 However, we don't know what you are doing with the data.但是,我们不知道您对数据做了什么。
Both are inefficient, the second is the worse as you recalculate the unique values at each step.两者都是低效的,当您在每一步重新计算唯一值时,第二个更糟。
Use vectorial code instead:改用矢量代码:
df2['col2'] = df2['col1'].isin(np.unique(df.values)).map({True: 'Ok', False: 'No'})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.