简体   繁体   English

如何使用字典作为规则,使用pd.apply()将值分配给数据框

[英]How do I use a dictionary as a rubric to assign a value to dataframe using pd.apply()

def create_rubric(number, df, col):
"""
First finds all the unique fields then segments them in quintiles.
Uses the quintiles to give ratings to the original data
"""

    sorted_col = df[col].sort_values()
    unique_val = sorted_col.unique()
    unique_cut = pd.qcut(unique_val,number,labels=False)
    unique_dict = {"Items" : unique_val, "Labels" : unique_cut}
    df = pd.DataFrame(unique_dict)
    rubric = {}
    rubric[1] = df[df.Labels == 0]
    rubric[2] = df[df.Labels == 1]
    rubric[3] = df[df.Labels == 2]
    rubric[4] = df[df.Labels == 3]
    rubric[5] = df[df.Labels == 4]
    return rubric

def frequency_star_rating(x, rubric):
"""
Uses rubric to score the rows in the dataframe
"""
    for rate, key in rubric.items():
        if x in key:
            return rate

rubric = create_rubric(5,rfm_report,"ordersCount")
rfm_report["Frequency Rating"] = rfm_report["ordersCount"].apply(frequency_star_rating, rubric)

I've written two functions that should interact with each other. 我编写了两个应该相互交互的函数。 One creates a scoring rubric that ends up in a dictionary and the other should use that dictionary to score rows in a dataframe of about 700,000 rows. 一个创建评分标准,最后在词典中,另一个应使用该词典对大约700,000行的数据框中的行进行评分。 For some reason I keep getting the “Series objects are mutable and cannot be hashed” error. 由于某种原因,我不断收到“ Series对象是可变的,不能被散列”错误。 I really can't figure out the best way to do this. 我真的不知道执行此操作的最佳方法。 Did I write the functions wrong? 我写的函数错了吗?

It would be nice if you could provide a toy dataset so we could run your code quickly and see where the error happens. 如果您可以提供一个玩具数据集,这样我们可以快速运行您的代码,看看错误发生在哪里,那将是很好的。

The error you are getting is trying to tell you that a pd.Series object cannot be used as the key of a dictionary. 您遇到的错误是试图告诉您pd.Series对象不能用作字典的键。 The reason is that Python dictionaries are hash tables. 原因是Python字典是哈希表。 So, they only accept hashable data types as the key. 因此,它们仅接受可散列的数据类型作为键。 For example, strings and integers are hashable, but lists are not. 例如,字符串和整数是可哈希的,但列表不是。 So the following works fine: 所以以下工作正常:

fine_dict = {'John': 1, 'Lilly': 2}

While this one will throw a TypeError : 虽然这将引发TypeError

wrong_dict = {['John']: 1, ['Lilly']: 2}

The error will look like this: TypeError: unhashable type: 'list'. 该错误看起来像这样:TypeError:不可散列的类型:'列表'。

So my hunch is that somewhere in your code, you're trying to use a Series object as the key of a dictionary, which you should not because it's not hashable. 所以我的直觉是,您在代码的某个位置上尝试使用Series对象作为字典的键,因此不应这样做,因为它不可哈希。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM