简体   繁体   English

列出元素的计数器

[英]List elements’ counter

New to Python here.这里是 Python 的新手。

I am looking for a simple way of creating a list (Output), which returns the count of the elements of another objective list (MyList) while preserving the indexing(?).我正在寻找一种创建列表(输出)的简单方法,该方法返回另一个目标列表(MyList)的元素计数,同时保留索引(?)。

This is what I would like to get:这就是我想得到的:

MyList = ["a", "b", "c", "c", "a", "c"]
Output = [ 2 ,  1 ,  3 ,  3 ,  2 ,  3 ]

I found solutions to a similar problem.我找到了类似问题的解决方案。 Count the number of occurrences for each element in a list.计算列表中每个元素的出现次数。

In  : Counter(MyList)
Out : Counter({'a': 2, 'b': 1, 'c': 3})

This, however, returns a Counter object which doesn't preserve the indexing.但是,这会返回一个不保留索引的 Counter 对象。

I assume that given the keys in the Counter I could construct my desired output, however I am not sure how to proceed.我假设给定计数器中的键,我可以构建我想要的输出,但是我不确定如何继续。

Extra info, I have pandas imported in my script and MyList is actually a column in a pandas dataframe.额外信息,我在脚本中导入了熊猫,而 MyList 实际上是熊猫数据框中的一列。

Instead of listcomp as in another solution you can use the function itemgetter :您可以使用函数itemgetter来代替另一个解决方案中的 listcomp :

from collections import Counter
from operator import itemgetter

lst = ["a", "b", "c", "c", "a", "c"]

c = Counter(lst)
itemgetter(*lst)(c)
# (2, 1, 3, 3, 2, 3)

UPDATE: As @ALollz mentioned in the comments this solution seems to be the fastet one.更新:正如@ALollz 在评论中提到的,这个解决方案似乎是最快速的。 If OP needs a list instead of a tuple the result must be converted wih list .如果 OP 需要列表而不是元组,则结果必须转换为list

You can use the list.count method, which will count the amount of times each string takes place in MyList .您可以使用list.count方法,将统计的时间每串发生在量MyList You can generate a new list with the counts by using a list comprehension :您可以使用列表理解生成包含计数的新列表:

MyList = ["a", "b", "c", "c", "a", "c"]

[MyList.count(i) for i in MyList]
# [2, 1, 3, 3, 2, 3]

Use np.unique to create a dictionary of value counts and map the values.使用np.unique创建值计数字典并映射值。 This will be fast, though not as fast as the Counter methods:这会很快,但不如 Counter 方法快:

import numpy as np

list(map(dict(zip(*np.unique(MyList, return_counts=True))).get, MyList))
#[2, 1, 3, 3, 2, 3]

Some timings for a moderate sized list:中等大小列表的一些时间安排:

MyList = np.random.randint(1, 2000, 5000).tolist()

%timeit [MyList.count(i) for i in MyList]
#413 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit list(map(dict(zip(*np.unique(MyList, return_counts=True))).get, MyList))
#1.89 ms ± 1.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit pd.DataFrame(MyList).groupby(MyList).transform(len)[0].tolist()
#2.18 s ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

c=Counter(MyList)
%timeit lout=[c[i] for i in MyList]
#679 µs ± 2.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

c = Counter(MyList)
%timeit list(itemgetter(*MyList)(c))
#503 µs ± 162 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Larger list:更大的列表:

MyList = np.random.randint(1, 2000, 50000).tolist()

%timeit [MyList.count(i) for i in MyList]
#41.2 s ± 5.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit list(map(dict(zip(*np.unique(MyList, return_counts=True))).get, MyList))
#18 ms ± 56.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit pd.DataFrame(MyList).groupby(MyList).transform(len)[0].tolist()
#2.44 s ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

c=Counter(MyList)
%timeit lout=[c[i] for i in MyList]
#6.89 ms ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

c = Counter(MyList)
%timeit list(itemgetter(*MyList)(c))
#5.27 ms ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

You just need to implement below piece of code你只需要实现下面的一段代码

    c=Counter(MyList)
    lout=[c[i] for i in MyList]

now list lout is your desired output现在 list lout是你想要的输出

A pandas solution looks like this:熊猫解决方案如下所示:

df = pd.DataFrame(data=["a", "b", "c", "c", "a", "c"], columns=['MyList'])
df['Count'] = df.groupby('MyList')['MyList'].transform(len)

Edit : One shouldn't use pandas if this is the only thing you want to do.编辑:如果这是您唯一想做的事情,则不应使用熊猫。 I only answered this question because of the pandas tag.我只回答了这个问题,因为熊猫标签。

The performance depends on the number of groups:性能取决于组的数量:

MyList = np.random.randint(1, 10, 10000).tolist()
df = pd.DataFrame(MyList)

%timeit [MyList.count(i) for i in MyList]
# 1.32 s ± 15.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.groupby(0)[0].transform(len)
# 3.89 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

MyList = np.random.randint(1, 9000, 10000).tolist()
df = pd.DataFrame(MyList)

%timeit [MyList.count(i) for i in MyList]
# 1.36 s ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.groupby(0)[0].transform(len)
# 1.33 s ± 19.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Note there was the indication from @Gio that list was pandas Series object.请注意,@Gio 表明列表是熊猫系列对象。 In that case you can convert Series object to list:在这种情况下,您可以将 Series 对象转换为列表:

import pandas as pd

l = ["a", "b", "c", "c", "a", "c"]
ds = pd.Series(l) 
l=ds.tolist()
[l.count(i) for i in ds] 
# [2, 1, 3, 3, 2, 3]

But, once you have the Series, you can count the elements via value_counts .但是,一旦你有了系列,你就可以通过value_counts计算元素。

l = ["a", "b", "c", "c", "a", "c"]
s = pd.Series(l) #Series object
c=s.value_counts() #c is Series again
[c[i] for i in s] 
# [2, 1, 3, 3, 2, 3]

This is one from the hettinger's classic snippets :)这是来自 hettinger 的经典片段之一:)

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
     'Counter that remembers the order elements are first seen'
     def __repr__(self):
         return '%s(%r)' % (self.__class__.__name__,
                            OrderedDict(self))
     def __reduce__(self):
         return self.__class__, (OrderedDict(self),)

x = ["a", "b", "c", "c", "a", "c"]
oc = OrderedCounter(x)
>>> oc
OrderedCounter(OrderedDict([('a', 2), ('b', 1), ('c', 3)]))
>>> oc['a']
2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM