如何统计Python Dataframe中唯一值的实例

Question

I have a dataframe like below where I have 2 million rows.我有一个如下所示的 dataframe，其中有 200 万行。 The sample data can be found here .可以在此处找到示例数据。

The list of matches in every row can be any number between 1 to 761. I want to count the occurrences of every number between 1 to 761 in the matches column altogether.每行中的匹配列表可以是 1 到 761 之间的任何数字。我想统计匹配列中 1 到 761 之间的每个数字的出现次数。 For example, the result of the above data will be:例如，上述数据的结果将是：

If a particular id is not found, then the count will be 0 in the output. I tried using for loop approach but it is quite slow.如果未找到特定 ID，则 output 中的计数将为 0。我尝试使用 for 循环方法，但速度很慢。

def readData():
    df = pd.read_excel(file_path)

    pattern_match_count = [0] * 761
    for index, row in df.iterrows():
        matches = row["matches"]

        for pattern_id in range(1, 762):
            if(pattern_id in matches):
                pattern_match_count[pattern_id - 1] = pattern_match_count[pattern_id - 1] + 1

Is there any better approach with pandas to make the implementation faster? pandas 是否有更好的方法来加快实施速度？

Answer 1

You can use the .explode() method to "explode" the lists into new rows.您可以使用.explode()方法将列表“分解”为新行。

def readData():
    df = pd.read_excel(file_path)
    return df.loc[:, "count"].explode().value_counts()

Answer 2

You can use collections.Counter :您可以使用collections.Counter ：

df = pd.DataFrame({"matches": [[1,2,3],[1,3,3,4]]})

#df:
#        matches
#0     [1, 2, 3]
#1  [1, 3, 3, 4]

from collections import Counter

C = Counter([i for sl in df.matches for i in sl])
#C:  
#Counter({1: 2, 2: 1, 3: 3, 4: 1})

pd.DataFrame(C.items(), columns=["match_id", "counts"]) 
#   match_id  counts
#0         1       2
#1         2       1
#2         3       3
#3         4       1

If you want zeros for match_id s that aren't in any of the matches, then you can update the Counter object C :如果您想要为不在任何匹配项中的match_id零，则可以更新Counter object C ：

for i in range(1,762):
    if i not in C:
        C[i] = 0
pd.DataFrame(C.items(), columns=["match_id", "counts"])

如何统计Python Dataframe中唯一值的实例

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-10-04 18:02:32

解决方案2
0 2022-10-04 19:04:03

如何统计Python Dataframe中唯一值的实例

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-10-04 18:02:32

解决方案2 0 2022-10-04 19:04:03

解决方案1
2 已采纳 2022-10-04 18:02:32

解决方案2
0 2022-10-04 19:04:03