[英]Create dict from a string or list
我想为给定的字符串或给定的列表生成一个哈希表。 哈希表将元素视为key
,将出现时间视为value
。 例如:
s = 'ababcd'
s = ['a', 'b', 'a', 'b', 'c', 'd']
dict_I_want = {'a':2,'b':2, 'c':1, 'd':1}
# method 1
from collections import Counter
s = 'ababcd'
hash_table1 = Counter(s)
# method 2
s = 'ababdc'
hash_table2 = dict()
for i in s:
if hash_table2.get(i) == None:
hash_table2[i] = 1
else:
hash_table2[i] += 1
hash_table1 == hash_table2
真的
通常,我使用上述两种方法。 一种来自标准库,但在某些代码实践站点中是不允许的。 另一个是从头开始写的,但我认为它太长了。 如果我使用 dict comprehension,我想出了 2 个额外的方法:
{i:s.count(i) for i in set(s)}
{i:s.count(i) for i in s}
我想知道是否还有其他方法可以清晰或有效地从列表字符串初始化哈希表?
from collections import Counter
import random,string,numpy,perfplot
def from_set(s):
return {i:s.count(i) for i in set(s)}
def from_string(s):
return {i:s.count(i) for i in s}
def handy(s):
hash_table2 = dict()
for i in s:
if hash_table2.get(i) == None:
hash_table2[i] = 1
else:
hash_table2[i] += 1
return hash_table2
def counter(s):
return Counter(s)
perfplot.show(
setup=lambda n: ''.join(random.choices(string.ascii_uppercase + string.digits, k=n)), # or simply setup=numpy.random.rand
kernels=[from_set,from_string,handy,counter],
labels=['set','string','handy','counter'],
n_range=[2 ** k for k in range(17)],
xlabel="len(string)",
equality_check= None
# More optional arguments with their default values:
# title=None,
# logx="auto", # set to True or False to force scaling
# logy="auto",
# equality_check=numpy.allclose, # set to None to disable "correctness" assertion
# automatic_order=True,
# colors=None,
# target_time_per_measurement=1.0,
# time_unit="s", # set to one of ("auto", "s", "ms", "us", or "ns") to force plot units
# relative_to=1, # plot the timings relative to one of the measurements
# flops=lambda n: 3*n, # FLOPS plots
)
最好的方法是使用内置计数器,否则,您可以使用 defualtdict 这与您的第二次尝试非常相似
from collections import defualtdict
d = defualtdict(int) # this makes every value 0 by defualt
for letter in string:
d[letter] +=1
我通常使用Counter或defaultdict来创建出现频率。
令人惊讶的是,发帖者的 from_set 方法在大多数情况下都优于两者。
观察
算法
from collections import Counter
from collections import defaultdict
import random,string,numpy,perfplot
def from_set(s):
" Use builtin count function for each item in set "
return {i:s.count(i) for i in set(s)}
def counter(s):
" Uses counter module "
return Counter(s)
def normal_dic(s):
" Update dictionary by checking if item in it or not "
d = {}
for i in s:
if i in d:
d[i] += 1
else:
d[i] = 1
return d
def setdefault_dic(s):
" Use setdefault to preset unknown keys "
d = {}
for i in s:
d.setdefault(i, 0)
d[i] += 1
return d
def default_dic(s):
" Used defaultdict from collections module "
d = defaultdict(int)
for i in s:
d[i] += 1
return d
def try_dic(s):
" Use try/except to check if item in dictionary "
d = {}
for i in s:
try:
d[i] += 1
except:
d[i] = 1
return d
测试代码
out = perfplot.bench(
setup=lambda n: ''.join(random.choices(string.ascii_uppercase + string.digits, k=n)), # or simply setup=numpy.random.rand
kernels=[from_set, counter, setdefault_dic, default_dic, try_dic],
labels=['set', 'counter', 'setdefault', 'defaultdict', 'try_dic'],
n_range=[2 ** k for k in range(17)],
xlabel="len(string)",
equality_check= None
# More optional arguments with their default values:
# title=None,
# logx="auto", # set to True or False to force scaling
# logy="auto",
# equality_check=numpy.allclose, # set to None to disable "correctness" assertion
# automatic_order=True,
# colors=None,
# target_time_per_measurement=1.0,
# time_unit="s", # set to one of ("auto", "s", "ms", "us", or "ns") to force plot units
# relative_to=1, # plot the timings relative to one of the measurements
# flops=lambda n: 3*n, # FLOPS plots
)
out.show()
#out.save("perf.png")
out
图表
绝对值
from_set 图中的标签“set”。 在下面的相对图上比较性能比这个绝对图更容易。
相对值
from_set 图中的标签“set”。
from_set 方法是水平线。 对于较大的值,包括 Counter 和 defaultdict 在内的所有其他方法都高于它(更耗时)。
桌子
实际时间
n setdefault try_dic defaultdict counter from_set
1.0 799.0 899.0 1299.0 6099.0 1399.0
2.0 1099.0 1199.0 1599.0 6299.0 1699.0
4.0 1699.0 1699.0 2199.0 6299.0 2399.0
8.0 3199.0 3099.0 3499.0 6899.0 3699.0
16.0 6099.0 5499.0 5899.0 7899.0 5900.0
32.0 10899.0 9299.0 9899.0 8999.0 10299.0
64.0 20799.0 15599.0 15999.0 11899.0 15099.0
128.0 38499.0 25499.0 25899.0 16599.0 21899.0
256.0 73100.0 44099.0 42700.0 26299.0 30299.0
512.0 137999.0 77099.0 72699.0 43199.0 46699.0
1024.0 286599.0 154500.0 144099.0 85700.0 79699.0
2048.0 549700.0 289999.0 266799.0 157499.0 145699.0
4096.0 1103899.0 577399.0 535499.0 309399.0 278999.0
8192.0 2200099.0 1151500.0 1051799.0 606999.0 542499.0
16384.0 4658199.0 2534399.0 2295300.0 1414199.0 1087799.0
32768.0 9509200.0 5270200.0 4838000.0 3066600.0 2177200.0
65536.0 19539500.0 10806300.0 9942100.0 6503299.0 4337599.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.