[英]Most pythonic and fastest way to create a list of key value pairs from a set of nested dictionaries?
I have come up with the following solution, but it was quite ugly (see original solution). 我提出了以下解决方案,但它非常难看(参见原始解决方案)。 I'm fairly happy with the revised solution.
我对修改后的解决方案非常满意。 Anybody have a cleaner / faster way to accomplish the same output?
有人用更干净/更快的方法来完成相同的输出吗?
Other requirements: 其他需求:
.
.
when no base_key is supplied. My revised solution: 我的修订解决方案
def create_nested_kvl(v, base_key=None):
kvl = []
if not isinstance(v, dict):
kvl.append((base_key,v))
else:
def iterate(v, k):
for ki, vi in v.items():
ki = '%s.%s' % (k, ki) if k else ki
iterate(vi, ki) if isinstance(vi, dict) else kvl.append((ki, vi))
iterate(v, base_key)
return kvl
My Original Solution: 我原来的解决方案
def create_nested_kvl(v, base_key=''):
""" Creates a list of dot syntax key value pairs from a nested dictionary.
:param v: The value suspected to be a nested dictionary.
:param k: Base key
:return: [(k,v)]
:rtype: list
"""
if not isinstance(v, dict):
return [(base_key,v)]
kvl = []
def iterate(v, k):
for kd, vd in v.items():
v = vd
kd = '%s.%s' % (k, kd) if k else kd
kvl.append((kd, v))
iterate(v, base_key)
for k, v in kvl:
if isinstance(v, dict):
iterate(v, k)
kvl.remove((k,v))
return kvl
input: 输入:
v = {'type1':'type1_val',
'type2':'type2_val',
'object': {
'k1': 'val1',
'k2': 'val2',
'k3': {'k31': {
'k311': 'val311',
'k322': 'val322',
'k333': 'val333'
},
'k32': 'val32',
'k33': 'val33'}}}
create_nested_kvl(v, 'base')
output: 输出:
[('base.type1', 'type1_val'),
('base.type2', 'type2_val'),
('base.object.k2', 'val2'),
('base.object.k1', 'val1'),
('base.object.k3.k33', 'val33'),
('base.object.k3.k32', 'val32'),
('base.object.k3.k31.k311', 'val311'),
('base.object.k3.k31.k333', 'val333'),
('base.object.k3.k31.k322', 'val322')]
Notes: 笔记:
timeit results @ number=1000000: timeit结果@ number = 1000000:
generator : 0.911420848311 (see alex's answer)
original : 0.720069713321
revised : 0.660259814902
best : 0.660259814902
* as Alex pointed out, my late night rounding skills are horrific.
It's 27% faster not twice as fast (my bad).
Apart from ordering of keys in dicts being arbitrary, and the possible need to trim leading .
除了dicts中的键的排序是任意的,并且可能需要修剪前导
.
s if that's needed for empty keys (spec unclear): 如果空键需要(规格不清楚):
def create_nested_kvl(v, k=''):
if isinstance(v, dict):
for tk in v:
for sk, sv in create_nested_kvl(v[tk], tk):
yield '{}.{}'.format(k, sk), sv
else:
yield k, v
seems nice and compact. 看起来很好,很紧凑。 Eg:
例如:
v = {'type1':'type1_val',
'type2':'type2_val',
'object': {
'k1': 'val1',
'k2': 'val2',
'k3': {'k31': {
'k311': 'val311',
'k322': 'val322',
'k333': 'val333'
},
'k32': 'val32',
'k33': 'val33'}}}
import pprint
pprint.pprint(list(create_nested_kvl(v, 'base')))
emits 发射
[('base.object.k3.k31.k311', 'val311'),
('base.object.k3.k31.k333', 'val333'),
('base.object.k3.k31.k322', 'val322'),
('base.object.k3.k33', 'val33'),
('base.object.k3.k32', 'val32'),
('base.object.k2', 'val2'),
('base.object.k1', 'val1'),
('base.type1', 'type1_val'),
('base.type2', 'type2_val')]
as required. 按要求。
Added: in Python, "fast" and "elegant" often coincide -- but not always so. 补充:在Python中,“快速”和“优雅”经常重合 - 但并非总是如此。 In particular, recursion is slightly slower and so are lookups of globals in loop.
特别是,递归稍慢,循环中全局变量的查找也是如此。 So, here, pulling all the usual tricks for recursion elimination w/an explicit stack, and lookup hoisting, one can get...:
所以,在这里,通过显式堆栈提取所有常规的递归消除技巧,并查找提升,可以得到......:
def faster(v, k='', isinstance=isinstance):
stack = [(k, v)]
result = []
push, pop = stack.append, stack.pop
resadd = result.append
fmt = '{}.{}'.format
while stack:
k, v = pop()
if isinstance(v, dict):
for tk, vtk in v.iteritems():
push((fmt(k, tk), vtk))
else:
resadd((k, v))
return result
...definitely not as elegant, but... on my laptop, my original version, plus a list()
at the end, takes 21.5 microseconds on the given sample v
; ...绝对不是那么优雅,但是...在我的笔记本电脑上,我的原始版本,加上最后的一个
list()
,在给定的样本v
上花费21.5微秒; this faster version takes 16.8 microseconds. 这个更快的版本需要16.8微秒。 If saving those 4.7 microseconds (or, expressed more meaningfully, 22% of the original runtime) is more important than clarity and maintainability, then one can pick the second version and get the same results (net as usual of ordering) that much faster.
如果保存那些4.7微秒(或者,更有意义地表示,原始运行时的22%)比清晰度和可维护性更重要,那么可以选择第二个版本并获得相同的结果(与通常的订购一样),这要快得多。
The OP's "revised version" is still faster on the sample v
, partly because formatting with %
is slightly faster in Python 2 than the more elegant format
, and partly because items
is slightly faster (again, Python 2 only) than iteritems
; OP的“修订版本”在样本
v
上仍然更快,部分原因是因为在Python 2中使用%
格式比在更优雅的format
中稍微快一些,部分原因是items
比iteritems
稍微快一点(仅限Python 2); and some hoisting might further shave some nanoseconds off that one, too. 并且一些提升可能会进一步削减一些纳秒。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.