简体   繁体   English

如何计算数据框中出现的次数?

[英]How to count a number of occurrences in Data Frame?

Need help. 需要帮忙。 I have Pandas DataFrame like: 我有像这样的Pandas DataFrame:

Shown ID                                       Bought ID
59,60,61,62,60,63,64,65,66,61,67,68,67         67,60,63
63,64,63,64,63,65,66                           0
87,63,84,63,86                                 86

I need to find the number of occurrences of each number of each "Show ID" row in whole "Show ID" column. 我需要在整个“显示ID”列中找到每个“显示ID”行的每个数字的出现次数。

So the expected result for "Shown ID" column is: 因此,“显示ID”列的预期结果是:

    [[('59', 1), ('60', 2), ('61', 2), ('62', 1), ('63', 6),
      ('64', 3), ('65', 2), ('66', 2), ('67', 2), ('68', 1)],
     [('63', 6), ('64', 3), ('65', 2), ('66', 2)],
     [('87', 1), ('63', 6), ('84', 1), ('86', 1)]]

How to do that? 怎么做?

Then I need to create a list of lists with sorted values of each row of "Shown ID" column (each list of result list of lists above). 然后,我需要创建一个列表列表,其中“显示的ID”列的每一行都具有排序的值(上面的列表结果列表的每个列表)。

So summary result must be: 因此汇总结果必须为:

[['63', '64', '60', '61', '65', '66', '67', '68', '59', '62'],
 ['63', '64', '65', '66'],
 ['63', '87', '84', '86']]

How Can I do that? 我怎样才能做到这一点? If the numbers have the same frequency of occurrences, it needs to sort in ascending appearing in list (the earlier appeared in row, the more priority) 如果数字具有相同的出现频率,则需要按升序排列出现在列表中(出现在行中越早,优先级越高)

This is how you can get what you are looking for: 这是您获得所需内容的方式:

import pandas as pd
from collections import Counter


data = [{'c_id' : [59,60,61,62,60,63,64,65,66,61,67,68,67]},
{'c_id' : [63,64,63,64,63,65,66]},
{'c_id' : [87,63,84,63,86]}]

df = pd.DataFrame.from_dict(data)

df['c_id'].apply(lambda val: [key for key,val in Counter(val).most_common()])

output: 输出:

0    [67, 60, 61, 64, 65, 66, 68, 59, 62, 63]
1                            [63, 64, 65, 66]
2                            [63, 84, 86, 87]

Values which have the same count might come in any order. 计数相同的值可能以任何顺序出现。

If you want to make column level counter then you can do it like this: 如果要创建列级计数器,则可以这样操作:

all_cids = []
for index, row in df.iterrows():
    all_cids.extend(row['c_id'])

import operator
counter_obj = Counter(all_cids)

def get_ordered_values(values):
    new_values = []
    covered_valeus = set()
    for val in values:
        if val in covered_valeus:
            continue
        covered_valeus.add(val)
        new_values.append((val, counter_obj[val]))    
    new_values.sort(key=operator.itemgetter(1), reverse=True)
    return [key for key, val in new_values]

df['c_id'].apply(lambda values: get_ordered_values(values))

output 输出

0    [63, 64, 60, 61, 65, 66, 67, 59, 62, 68]
1                            [63, 64, 65, 66]
2                            [63, 84, 86, 87]

If i understand it completely , you want to find number of occurrences but not list of indexes where specific data is found. 如果我完全理解它,则想查找出现次数,而不是找到特定数据的索引列表。 I can imagine several ways of doing this: 我可以想像这样做的几种方法:

  1. way:, count the data. 方式:统计数据。

If your data type is not the multidimensional list, then you can simple use count function in list object. 如果您的数据类型不是多维列表,则可以在列表对象中简单使用计数功能。

# in python3 you would need list(range(3)) etc to test this example
someList = range(3)+range(2)+range(1)

sortedElements = sorted(set(someList)) #> looses duplicates of elements, somelist must be hash-able

for x in sortedElements:
    # list.count(somelist,element) is usable for python2.7 and python3.5
    # tested myself on py interpreter, i can not say for IronPython and/or Rhino enviorment
    print( x, someList.count(x) ) # and there you will have element, and number of occurrences 
  1. Returning indexes of duplications: 返回重复索引:

     #somelist same as before #sortedElements same as before for x in sortedElements: lIndexes = [ someList.index(elem) for elem in sortedElements if elem == x] print(lIndexes) 
  2. Multidimensional list: 多维列表:

As i see it, you must first dump the whole list into 1 list or , do steps 1 or 2 on each child list of multidimensional list depending on your need. 如我所见,您必须首先将整个列表转储到1个列表中, 或者根据需要在多维列表的每个子列表上执行步骤1或2
Of course there is several way to transverse multidimensional list, you can map or filter or reduce or pass them as *arguments etc ( there are too many ways to transverse multi list for me to count, you can find most of the methods on this website ) but the method of your choosing is very tightly connected to your example. 当然,有多种方法可以横向map多维列表,您可以mapfilterreducepass them as *arguments (等等)(对于我来说,横向多维列表的计数方法太多了,您可以在本网站上找到大多数方法),但您选择的方法与您的示例紧密相关。 Without further explanation i would not dare to consult you since it could do more damage and good. 没有进一步的解释,我不敢咨询您,因为它可能造成更大的损害和好处。

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM