How do I parallelize a simple Python loop? This is probably a trivial question, but how do I parallelize the following loop in python?
df["a"] = np.where(pd.notnull(df["a"]) == True, 6, 0)
df["b"] = np.where(pd.notnull(df["b"]) == True, 2, 0)
df["b"] = np.where(pd.notnull(df["b"]) == True, 1, 0)
df["c"] = np.where(pd.notnull(df["c"]) == True, 1, 0)
df["d"] = np.where(pd.notnull(df["d"]) == True, 1, 0)
df["e"] = np.where(pd.notnull(df["e"]) == True, 2, 0)
df["f"] = np.where(pd.notnull(df["f"]) == True, 1, 0)
df["g"] = np.where(pd.notnull(df["g"]) == True, 2, 0)
df["h"] = np.where(pd.notnull(df["h"]) == True, 2, 0)
df["i"] = np.where(pd.notnull(df["i"]) == True, 2, 0)
What's the easiest way to parallelize this code?
I try
df = ["a", "b","c", "d",.....]
df_score = [6,2,1,1, .....]
for I in range():
df[I] = np.where(pd.notnull(df[I]) == True, df_score[I], 0)
To achieve parallel execution, you can use threads:
# initialize df as before
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as pool:
def update(key, score):
df[key] = np.where(pd.notnull(df[key]) == True, score, 0)
for key, score in ("a", 6), ("b", 2), ...:
pool.submit(update, key, score)
Whether this will actually result in a speedup depends on whether Pandas releases the GIL during its calculations. The only way to know is to measure.
If you're trying to simplify your code with Python data structures I'd suggest you to use a dictionary (combination of df & df_score values in a key-value structure):
dct = {"a" : 6, "b" : 2, "c" : 1} # fill-in the rest of values
for key in dct:
df[key] = np.where(pd.notnull(df[key]) == True, dct[key], 0)
If you mean to use parallel programming then it's a whole different issue, see threading in python: https://docs.python.org/3/library/threading.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.