简体   繁体   English

如何使用 lambda 在 python 中正确更新全局变量

[英]How to properly update a global variable in python using lambda

I have a dataframe in which each row shows one transaction and items within that transactions.我有一个数据框,其中每一行都显示一个交易和该交易中的项目。 Here is how my dataframe looks like这是我的数据框的样子

itemList
A,B,C
B,F
G,A
...

I want to find the frequency of each item (how many times it appeared in the transactions. I have defined a dictionary and try to update its value as shown below我想找到每个项目的频率(它在交易中出现的次数。我定义了一个字典并尝试更新它的值,如下所示

dict ={}
def update(itemList):
   #Update the value of each item in the dict

df.itemList.apply(lambda x: update(x))

As apply function gets executed for multiple row at the same time, multiple rows try to update the values in dict at the same time and it's causing an issue.由于同时对多行执行apply函数,因此多行尝试同时更新dict中的值,这会导致问题。 How can I make sure multiple updated to dict does not cause any issue?如何确保多次更新到dict不会导致任何问题?

I think you only need Series.str.get_dummies :我认为你只需要Series.str.get_dummies

df['itemList'].str.get_dummies(',').sum().to_dict()
#{'A': 2, 'B': 2, 'C': 1, 'F': 1, 'G': 1}

If there are more columns use:如果有更多列,请使用:

df.stack().str.get_dummies(',').sum().to_dict()

if you want to count for each row:如果你想为每一行计数:

df['itemList'].str.get_dummies(',').to_dict('index')
#{0: {'A': 1, 'B': 1, 'C': 1, 'F': 0, 'G': 0},
# 1: {'A': 0, 'B': 1, 'C': 0, 'F': 1, 'G': 0},
# 2: {'A': 1, 'B': 0, 'C': 0, 'F': 0, 'G': 1}}

As @Quang Hoang said in the comments apply simply apply the function to each row / column using a loop正如@Quang Hoang 在评论中所说, apply只需使用循环将函数应用于每一行/列

You might be better off relying on native python here,你最好在这里依赖原生 python,

df = pd.DataFrame({'itemlist':['a,b,c', 'b,f', 'g,a', 'd,g,f,d,s,a,v', 'e,w,d,f,g,h', 's,d,f,e,r,t', 'e,d,f,g,r,r','s,d,f']})

Here is a solution using Counter,这是使用计数器的解决方案,

df['itemlist'].str.replace(',','').apply(lambda x: Counter(x)).sum()

Some comparisons,一些比较,

%timeit df['itemlist'].str.split(',', expand = True).stack().value_counts().to_dict()
2.64 ms ± 99.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['itemlist'].str.get_dummies(',').sum().to_dict()
3.22 ms ± 68.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

from collections import Counter
%timeit df['itemlist'].str.replace(',','').apply(lambda x: Counter(x)).sum()
778 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM