简体   繁体   中英

How to make parallelize a simple Python for loop use pandas?

How do I parallelize a simple Python loop? This is probably a trivial question, but how do I parallelize the following loop in python?

df["a"] = np.where(pd.notnull(df["a"]) == True, 6, 0)
df["b"] = np.where(pd.notnull(df["b"]) == True, 2, 0)
df["b"] = np.where(pd.notnull(df["b"]) == True, 1, 0)
df["c"] = np.where(pd.notnull(df["c"]) == True, 1, 0)
df["d"] = np.where(pd.notnull(df["d"]) == True, 1, 0)
df["e"] = np.where(pd.notnull(df["e"]) == True, 2, 0)
df["f"] = np.where(pd.notnull(df["f"]) == True, 1, 0)
df["g"] = np.where(pd.notnull(df["g"]) == True, 2, 0)
df["h"] = np.where(pd.notnull(df["h"]) == True, 2, 0)
df["i"] = np.where(pd.notnull(df["i"]) == True, 2, 0)

What's the easiest way to parallelize this code?

I try

df = ["a", "b","c", "d",.....]
df_score = [6,2,1,1, .....]
  for I in range():
df[I] = np.where(pd.notnull(df[I]) == True, df_score[I], 0)

To achieve parallel execution, you can use threads:

# initialize df as before

import concurrent.futures

with concurrent.futures.ThreadPoolExecutor() as pool:
    def update(key, score):
        df[key] = np.where(pd.notnull(df[key]) == True, score, 0)

    for key, score in ("a", 6), ("b", 2), ...:
        pool.submit(update, key, score)

Whether this will actually result in a speedup depends on whether Pandas releases the GIL during its calculations. The only way to know is to measure.

If you're trying to simplify your code with Python data structures I'd suggest you to use a dictionary (combination of df & df_score values in a key-value structure):

dct = {"a" : 6, "b" : 2, "c" : 1}  # fill-in the rest of values
for key in dct:
   df[key] = np.where(pd.notnull(df[key]) == True, dct[key], 0)

If you mean to use parallel programming then it's a whole different issue, see threading in python: https://docs.python.org/3/library/threading.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM