Assign dataframe to variable outside for loop or use it directly inside for loop in Python

Question

option 1:

a = np.unique(df.values)
for i in range():
  if df2.loc[i,'col1'] in a:
    df2.loc[i,'col2'] = 'Ok'
  else:
    df2.loc[i,'col2'] = 'No'

option 2:

for i in range():
  if df2.loc[i,'col1'] in np.unique(df.values):
    df2.loc[i,'col2'] = 'Ok'
  else:
    df2.loc[i,'col2'] = 'No'

Which is better in terms of memory and speed in Python?

Edited for clarity on the operation inside the for loop.

Answer 1

In terms of memory, option 2 would be probably be better because you aren't making a new variable. In terms of speed, there wouldn't be a difference because they df.values and a refer to the same piece of data. You can see if two variables refer to the same piece of data by using the is keyword: var1 is var2 . However, we don't know what you are doing with the data.

Answer 2

Both are inefficient, the second is the worse as you recalculate the unique values at each step.

Use vectorial code instead:

df2['col2'] = df2['col1'].isin(np.unique(df.values)).map({True: 'Ok', False: 'No'})

Assign dataframe to variable outside for loop or use it directly inside for loop in Python

Question

2 answers

solution1
0 2022-01-18 20:42:22

solution2
0 ACCPTED 2022-01-18 20:59:54

Assign dataframe to variable outside for loop or use it directly inside for loop in Python

Question

2 answers

solution1 0 2022-01-18 20:42:22

solution2 0 ACCPTED 2022-01-18 20:59:54

solution1
0 2022-01-18 20:42:22

solution2
0 ACCPTED 2022-01-18 20:59:54