[英]How do I merge a list of dictionaries by an identical field and sum another field in the process?
Attempting to merge a list of dictionaries by a url field, which if has an identical dictionary item in the list, will merge the identical ones by this field while adding the sum for another field at the same time.尝试通过 url 字段合并字典列表,如果列表中有相同的字典项,将通过该字段合并相同的字典,同时为另一个字段添加总和。
I've tried using 'setdefault' but it doesn't always work as expected.我试过使用'setdefault',但它并不总是按预期工作。 I'm still getting duplicate results after running the loop.
运行循环后我仍然得到重复的结果。
Here is the list of dicts I'm trying to condense with the sum of the second field added to get its sum where identical urls exist:这是我试图用添加的第二个字段的总和来压缩的字典列表,以获得相同网址存在的总和:
[
['https://www.website.com/directory/link-1',
21,
'Long Text Field 1',
'String 1',
{'url': 'https://www.website.com/images/image-1.jpg'},
255],
['https://www.website.com/directory/link-1',
185,
'Long Text Field 1',
'String 1',
{'url': 'https://www.website.com/images/image-1.jpg'},
255],
['https://www.website.com/directory/link-2',
296,
'Long Text Field 2',
'String 2',
{'url': 'https://www.website.com/images/image-2.jpg'},
303],
['https://www.website.com/directory/link-3',
354,
'Long Text Field 3',
'String 3',
{'url': 'https://www.website.com/images/image-3.jpg'},
388],
['https://www.website.com/directory/link-4',
606,
'Long Text Field 4',
'String 4',
{'url': 'https://www.website.com/images/image-4.jpg'},
624]
]
This is the result I'm trying to get:这是我想要得到的结果:
[
['https://www.website.com/directory/link-1',
206,
'Long Text Field 1',
'String 1',
{'url': 'https://www.website.com/images/image-1.jpg'},
255],
['https://www.website.com/directory/link-2',
296,
'Long Text Field 2',
'String 2',
{'url': 'https://www.website.com/images/image-2.jpg'},
303],
['https://www.website.com/directory/link-3',
354,
'Long Text Field 3',
'String 3',
{'url': 'https://www.website.com/images/image-3.jpg'},
388],
['https://www.website.com/directory/link-4',
606,
'Long Text Field 4',
'String 4',
{'url': 'https://www.website.com/images/image-4.jpg'},
624]
]
I'm trying我正在努力
for url, long_text, number_to_count, another_field, ..., ... in list:
d = {}
d.setdefault(url, {}).setdefault("long text", []).append(long_text)
d[url].setdefault("number_to_count",[]).append(number_to_count)
d[url].setdefault("another_field",[]).append(another_field)
Here is something you can try.这是您可以尝试的方法。 It basically groups the sublists from
lst
by the first URL into a defaultdict of lists, then builds a new result only with the second item number summed up.它基本上将
lst
中的子列表按第一个 URL 分组到列表的默认字典中,然后仅将第二个项目编号相加来构建新结果。
from collections import defaultdict
from pprint import pprint
lst = ...
d = defaultdict(list)
for item in lst:
d[item[0]].append(item)
result = [[v[0][0]] + [sum(x[1] for x in v)] + v[0][2:] for v in d.values()]
pprint(result)
Output: Output:
[['https://www.website.com/directory/link-1',
206,
'Long Text Field 1',
'String 1',
{'url': 'https://www.website.com/images/image-1.jpg'},
255],
['https://www.website.com/directory/link-2',
296,
'Long Text Field 2',
{'url': 'https://www.website.com/images/image-2.jpg'},
303],
['https://www.website.com/directory/link-3',
354,
'Long Text Field 3',
{'url': 'https://www.website.com/images/image-3.jpg'},
388],
['https://www.website.com/directory/link-4',
606,
'Long Text Field 4',
{'url': 'https://www.website.com/images/image-4.jpg'},
624]]
If you want to use pandas
you can get something like the following:如果你想使用
pandas
你可以得到类似下面的东西:
Page Count Text String Url Magic
0 https://www.website.com/directory/link-1 21 Long Text Field 1 String 1 https://www.website.com/images/image-1.jpg 255
1 https://www.website.com/directory/link-1 185 Long Text Field 1 String 1 https://www.website.com/images/image-1.jpg 255
2 https://www.website.com/directory/link-2 296 Long Text Field 2 None https://www.website.com/images/image-2.jpg 303
3 https://www.website.com/directory/link-3 354 Long Text Field 3 None https://www.website.com/images/image-3.jpg 388
4 https://www.website.com/directory/link-4 606 Long Text Field 4 None https://www.website.com/images/image-4.jpg 624
----
Page Count Magic String Url Text
0 https://www.website.com/directory/link-1 206 255 String 1 https://www.website.com/images/image-1.jpg Long Text Field 1
1 https://www.website.com/directory/link-2 296 303 None https://www.website.com/images/image-2.jpg Long Text Field 2
2 https://www.website.com/directory/link-3 354 388 None https://www.website.com/images/image-3.jpg Long Text Field 3
3 https://www.website.com/directory/link-4 606 624 None https://www.website.com/images/image-4.jpg Long Text Field 4
by running the below code.通过运行以下代码。 Note that I had to add dummy values for the missing strings, since your data format is somewhat inconsistent.
请注意,我必须为丢失的字符串添加虚拟值,因为您的数据格式有些不一致。
import pandas as pd
data = [
['https://www.website.com/directory/link-1',
21,
'Long Text Field 1',
'String 1',
{'url': 'https://www.website.com/images/image-1.jpg'},
255],
['https://www.website.com/directory/link-1',
185,
'Long Text Field 1',
'String 1',
{'url': 'https://www.website.com/images/image-1.jpg'},
255],
['https://www.website.com/directory/link-2',
296,
'Long Text Field 2',
{'url': 'https://www.website.com/images/image-2.jpg'},
303],
['https://www.website.com/directory/link-3',
354,
'Long Text Field 3',
{'url': 'https://www.website.com/images/image-3.jpg'},
388],
['https://www.website.com/directory/link-4',
606,
'Long Text Field 4',
{'url': 'https://www.website.com/images/image-4.jpg'},
624]
]
columns = ['Page', 'Count', 'Text', 'String', 'Url', 'Magic']
for d in data:
if len(d) != 6:
d.insert(3, None)
d[4] = d[4]['url']
df = pd.DataFrame(data, columns=columns)
agg = dict.fromkeys(columns, 'first')
agg.update({'Count': 'sum'})
del agg['Page']
df2 = df.groupby(['Page'], as_index=False).agg(agg)
pd.options.display.width = 0
print df
print '\n----\n'
print df2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.