如何计算数据框中出现的次数？

Question

Need help. 需要帮忙。 I have Pandas DataFrame like: 我有像这样的Pandas DataFrame：

Shown ID                                       Bought ID
59,60,61,62,60,63,64,65,66,61,67,68,67         67,60,63
63,64,63,64,63,65,66                           0
87,63,84,63,86                                 86

I need to find the number of occurrences of each number of each "Show ID" row in whole "Show ID" column. 我需要在整个“显示ID”列中找到每个“显示ID”行的每个数字的出现次数。

So the expected result for "Shown ID" column is: 因此，“显示ID”列的预期结果是：

    [[('59', 1), ('60', 2), ('61', 2), ('62', 1), ('63', 6),
      ('64', 3), ('65', 2), ('66', 2), ('67', 2), ('68', 1)],
     [('63', 6), ('64', 3), ('65', 2), ('66', 2)],
     [('87', 1), ('63', 6), ('84', 1), ('86', 1)]]

How to do that? 怎么做？

Then I need to create a list of lists with sorted values of each row of "Shown ID" column (each list of result list of lists above). 然后，我需要创建一个列表列表，其中“显示的ID”列的每一行都具有排序的值（上面的列表结果列表的每个列表）。

So summary result must be: 因此汇总结果必须为：

[['63', '64', '60', '61', '65', '66', '67', '68', '59', '62'],
 ['63', '64', '65', '66'],
 ['63', '87', '84', '86']]

How Can I do that? 我怎样才能做到这一点？ If the numbers have the same frequency of occurrences, it needs to sort in ascending appearing in list (the earlier appeared in row, the more priority) 如果数字具有相同的出现频率，则需要按升序排列出现在列表中（出现在行中越早，优先级越高）

Answer 1

This is how you can get what you are looking for: 这是您获得所需内容的方式：

import pandas as pd
from collections import Counter


data = [{'c_id' : [59,60,61,62,60,63,64,65,66,61,67,68,67]},
{'c_id' : [63,64,63,64,63,65,66]},
{'c_id' : [87,63,84,63,86]}]

df = pd.DataFrame.from_dict(data)

df['c_id'].apply(lambda val: [key for key,val in Counter(val).most_common()])

output: 输出：

0    [67, 60, 61, 64, 65, 66, 68, 59, 62, 63]
1                            [63, 64, 65, 66]
2                            [63, 84, 86, 87]

Values which have the same count might come in any order. 计数相同的值可能以任何顺序出现。

If you want to make column level counter then you can do it like this: 如果要创建列级计数器，则可以这样操作：

all_cids = []
for index, row in df.iterrows():
    all_cids.extend(row['c_id'])

import operator
counter_obj = Counter(all_cids)

def get_ordered_values(values):
    new_values = []
    covered_valeus = set()
    for val in values:
        if val in covered_valeus:
            continue
        covered_valeus.add(val)
        new_values.append((val, counter_obj[val]))    
    new_values.sort(key=operator.itemgetter(1), reverse=True)
    return [key for key, val in new_values]

df['c_id'].apply(lambda values: get_ordered_values(values))

output 输出

0    [63, 64, 60, 61, 65, 66, 67, 59, 62, 68]
1                            [63, 64, 65, 66]
2                            [63, 84, 86, 87]

Answer 2

If i understand it completely , you want to find number of occurrences but not list of indexes where specific data is found. 如果我完全理解它，则想查找出现次数，而不是找到特定数据的索引列表。 I can imagine several ways of doing this: 我可以想像这样做的几种方法：

way:, count the data. 方式：统计数据。

If your data type is not the multidimensional list, then you can simple use count function in list object. 如果您的数据类型不是多维列表，则可以在列表对象中简单使用计数功能。

# in python3 you would need list(range(3)) etc to test this example
someList = range(3)+range(2)+range(1)

sortedElements = sorted(set(someList)) #> looses duplicates of elements, somelist must be hash-able

for x in sortedElements:
    # list.count(somelist,element) is usable for python2.7 and python3.5
    # tested myself on py interpreter, i can not say for IronPython and/or Rhino enviorment
    print( x, someList.count(x) ) # and there you will have element, and number of occurrences

Returning indexes of duplications: 返回重复索引：

 #somelist same as before #sortedElements same as before for x in sortedElements: lIndexes = [ someList.index(elem) for elem in sortedElements if elem == x] print(lIndexes)

Multidimensional list: 多维列表：

As i see it, you must first dump the whole list into 1 list or , do steps 1 or 2 on each child list of multidimensional list depending on your need. 如我所见，您必须首先将整个列表转储到1个列表中，或者根据需要在多维列表的每个子列表上执行步骤1或2 。
Of course there is several way to transverse multidimensional list, you can map or filter or reduce or pass them as *arguments etc ( there are too many ways to transverse multi list for me to count, you can find most of the methods on this website ) but the method of your choosing is very tightly connected to your example. 当然，有多种方法可以横向map多维列表，您可以map或filter或reduce或pass them as *arguments （等等）（对于我来说，横向多维列表的计数方法太多了，您可以在本网站上找到大多数方法），但您选择的方法与您的示例紧密相关。 Without further explanation i would not dare to consult you since it could do more damage and good. 没有进一步的解释，我不敢咨询您，因为它可能造成更大的损害和好处。

Hope this helps. 希望这可以帮助。

如何计算数据框中出现的次数？

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-01-04 13:11:41

解决方案2
1 2017-01-04 13:25:32

如何计算数据框中出现的次数？

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-01-04 13:11:41

解决方案2 1 2017-01-04 13:25:32

解决方案1
2 已采纳 2017-01-04 13:11:41

解决方案2
1 2017-01-04 13:25:32