简体   繁体   中英

How to properly update a global variable in python using lambda

I have a dataframe in which each row shows one transaction and items within that transactions. Here is how my dataframe looks like

itemList
A,B,C
B,F
G,A
...

I want to find the frequency of each item (how many times it appeared in the transactions. I have defined a dictionary and try to update its value as shown below

dict ={}
def update(itemList):
   #Update the value of each item in the dict

df.itemList.apply(lambda x: update(x))

As apply function gets executed for multiple row at the same time, multiple rows try to update the values in dict at the same time and it's causing an issue. How can I make sure multiple updated to dict does not cause any issue?

I think you only need Series.str.get_dummies :

df['itemList'].str.get_dummies(',').sum().to_dict()
#{'A': 2, 'B': 2, 'C': 1, 'F': 1, 'G': 1}

If there are more columns use:

df.stack().str.get_dummies(',').sum().to_dict()

if you want to count for each row:

df['itemList'].str.get_dummies(',').to_dict('index')
#{0: {'A': 1, 'B': 1, 'C': 1, 'F': 0, 'G': 0},
# 1: {'A': 0, 'B': 1, 'C': 0, 'F': 1, 'G': 0},
# 2: {'A': 1, 'B': 0, 'C': 0, 'F': 0, 'G': 1}}

As @Quang Hoang said in the comments apply simply apply the function to each row / column using a loop

You might be better off relying on native python here,

df = pd.DataFrame({'itemlist':['a,b,c', 'b,f', 'g,a', 'd,g,f,d,s,a,v', 'e,w,d,f,g,h', 's,d,f,e,r,t', 'e,d,f,g,r,r','s,d,f']})

Here is a solution using Counter,

df['itemlist'].str.replace(',','').apply(lambda x: Counter(x)).sum()

Some comparisons,

%timeit df['itemlist'].str.split(',', expand = True).stack().value_counts().to_dict()
2.64 ms ± 99.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['itemlist'].str.get_dummies(',').sum().to_dict()
3.22 ms ± 68.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

from collections import Counter
%timeit df['itemlist'].str.replace(',','').apply(lambda x: Counter(x)).sum()
778 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM