How to make parallelize a simple Python for loop use pandas?

Question

How do I parallelize a simple Python loop? This is probably a trivial question, but how do I parallelize the following loop in python?

df["a"] = np.where(pd.notnull(df["a"]) == True, 6, 0)
df["b"] = np.where(pd.notnull(df["b"]) == True, 2, 0)
df["b"] = np.where(pd.notnull(df["b"]) == True, 1, 0)
df["c"] = np.where(pd.notnull(df["c"]) == True, 1, 0)
df["d"] = np.where(pd.notnull(df["d"]) == True, 1, 0)
df["e"] = np.where(pd.notnull(df["e"]) == True, 2, 0)
df["f"] = np.where(pd.notnull(df["f"]) == True, 1, 0)
df["g"] = np.where(pd.notnull(df["g"]) == True, 2, 0)
df["h"] = np.where(pd.notnull(df["h"]) == True, 2, 0)
df["i"] = np.where(pd.notnull(df["i"]) == True, 2, 0)

What's the easiest way to parallelize this code?

I try

df = ["a", "b","c", "d",.....]
df_score = [6,2,1,1, .....]
  for I in range():
df[I] = np.where(pd.notnull(df[I]) == True, df_score[I], 0)

Answer 1

To achieve parallel execution, you can use threads:

# initialize df as before

import concurrent.futures

with concurrent.futures.ThreadPoolExecutor() as pool:
    def update(key, score):
        df[key] = np.where(pd.notnull(df[key]) == True, score, 0)

    for key, score in ("a", 6), ("b", 2), ...:
        pool.submit(update, key, score)

Whether this will actually result in a speedup depends on whether Pandas releases the GIL during its calculations. The only way to know is to measure.

Answer 2

If you're trying to simplify your code with Python data structures I'd suggest you to use a dictionary (combination of df & df_score values in a key-value structure):

dct = {"a" : 6, "b" : 2, "c" : 1}  # fill-in the rest of values
for key in dct:
   df[key] = np.where(pd.notnull(df[key]) == True, dct[key], 0)

If you mean to use parallel programming then it's a whole different issue, see threading in python: https://docs.python.org/3/library/threading.html

How to make parallelize a simple Python for loop use pandas?

Question

2 answers

solution1
1 2020-03-28 10:01:26

solution2
0 ACCPTED 2020-03-28 09:56:44

How to make parallelize a simple Python for loop use pandas?

Question

2 answers

solution1 1 2020-03-28 10:01:26

solution2 0 ACCPTED 2020-03-28 09:56:44

solution1
1 2020-03-28 10:01:26

solution2
0 ACCPTED 2020-03-28 09:56:44