简体   繁体   English

如何按相同字段合并字典列表并在此过程中对另一个字段求和?

[英]How do I merge a list of dictionaries by an identical field and sum another field in the process?

Attempting to merge a list of dictionaries by a url field, which if has an identical dictionary item in the list, will merge the identical ones by this field while adding the sum for another field at the same time.尝试通过 url 字段合并字典列表,如果列表中有相同的字典项,将通过该字段合并相同的字典,同时为另一个字段添加总和。

I've tried using 'setdefault' but it doesn't always work as expected.我试过使用'setdefault',但它并不总是按预期工作。 I'm still getting duplicate results after running the loop.运行循环后我仍然得到重复的结果。

Here is the list of dicts I'm trying to condense with the sum of the second field added to get its sum where identical urls exist:这是我试图用添加的第二个字段的总和来压缩的字典列表,以获得相同网址存在的总和:

[
  ['https://www.website.com/directory/link-1',
  21,
  'Long Text Field 1',
  'String 1',
  {'url': 'https://www.website.com/images/image-1.jpg'},
  255],

  ['https://www.website.com/directory/link-1',
  185,
  'Long Text Field 1',
  'String 1',
  {'url': 'https://www.website.com/images/image-1.jpg'},
  255],

  ['https://www.website.com/directory/link-2',
  296,
  'Long Text Field 2',
  'String 2',
  {'url': 'https://www.website.com/images/image-2.jpg'},
  303],

  ['https://www.website.com/directory/link-3',
  354,
  'Long Text Field 3',
  'String 3',
  {'url': 'https://www.website.com/images/image-3.jpg'},
  388],

  ['https://www.website.com/directory/link-4',
  606,
  'Long Text Field 4',
  'String 4',
  {'url': 'https://www.website.com/images/image-4.jpg'},
  624]
]

This is the result I'm trying to get:这是我想要得到的结果:

[
 ['https://www.website.com/directory/link-1',
  206,
  'Long Text Field 1',
  'String 1',
  {'url': 'https://www.website.com/images/image-1.jpg'},
  255],

  ['https://www.website.com/directory/link-2',
  296,
  'Long Text Field 2',
  'String 2',
  {'url': 'https://www.website.com/images/image-2.jpg'},
  303],

  ['https://www.website.com/directory/link-3',
  354,
  'Long Text Field 3',
  'String 3',
  {'url': 'https://www.website.com/images/image-3.jpg'},
  388],

  ['https://www.website.com/directory/link-4',
  606,
  'Long Text Field 4',
  'String 4',
  {'url': 'https://www.website.com/images/image-4.jpg'},
  624]
]

I'm trying我正在努力

for url, long_text, number_to_count, another_field, ..., ... in list:
    d = {}
    d.setdefault(url, {}).setdefault("long text", []).append(long_text)
    d[url].setdefault("number_to_count",[]).append(number_to_count)
    d[url].setdefault("another_field",[]).append(another_field)

Here is something you can try.这是您可以尝试的方法。 It basically groups the sublists from lst by the first URL into a defaultdict of lists, then builds a new result only with the second item number summed up.它基本上将lst中的子列表按第一个 URL 分组到列表的默认字典中,然后仅将第二个项目编号相加来构建新结果。

from collections import defaultdict
from pprint import pprint

lst = ...

d = defaultdict(list)
for item in lst:
    d[item[0]].append(item)

result = [[v[0][0]] + [sum(x[1] for x in v)] + v[0][2:] for v in d.values()]

pprint(result)

Output: Output:

[['https://www.website.com/directory/link-1',
  206,
  'Long Text Field 1',
  'String 1',
  {'url': 'https://www.website.com/images/image-1.jpg'},
  255],
 ['https://www.website.com/directory/link-2',
  296,
  'Long Text Field 2',
  {'url': 'https://www.website.com/images/image-2.jpg'},
  303],
 ['https://www.website.com/directory/link-3',
  354,
  'Long Text Field 3',
  {'url': 'https://www.website.com/images/image-3.jpg'},
  388],
 ['https://www.website.com/directory/link-4',
  606,
  'Long Text Field 4',
  {'url': 'https://www.website.com/images/image-4.jpg'},
  624]]

If you want to use pandas you can get something like the following:如果你想使用pandas你可以得到类似下面的东西:

                                       Page  Count               Text    String                                         Url  Magic
0  https://www.website.com/directory/link-1     21  Long Text Field 1  String 1  https://www.website.com/images/image-1.jpg    255
1  https://www.website.com/directory/link-1    185  Long Text Field 1  String 1  https://www.website.com/images/image-1.jpg    255
2  https://www.website.com/directory/link-2    296  Long Text Field 2      None  https://www.website.com/images/image-2.jpg    303
3  https://www.website.com/directory/link-3    354  Long Text Field 3      None  https://www.website.com/images/image-3.jpg    388
4  https://www.website.com/directory/link-4    606  Long Text Field 4      None  https://www.website.com/images/image-4.jpg    624

----

                                       Page  Count  Magic    String                                         Url               Text
0  https://www.website.com/directory/link-1    206    255  String 1  https://www.website.com/images/image-1.jpg  Long Text Field 1
1  https://www.website.com/directory/link-2    296    303      None  https://www.website.com/images/image-2.jpg  Long Text Field 2
2  https://www.website.com/directory/link-3    354    388      None  https://www.website.com/images/image-3.jpg  Long Text Field 3
3  https://www.website.com/directory/link-4    606    624      None  https://www.website.com/images/image-4.jpg  Long Text Field 4

by running the below code.通过运行以下代码。 Note that I had to add dummy values for the missing strings, since your data format is somewhat inconsistent.请注意,我必须为丢失的字符串添加虚拟值,因为您的数据格式有些不一致。

import pandas as pd

data = [
  ['https://www.website.com/directory/link-1',
  21,
  'Long Text Field 1',
  'String 1',
  {'url': 'https://www.website.com/images/image-1.jpg'},
  255],

  ['https://www.website.com/directory/link-1',
  185,
  'Long Text Field 1',
  'String 1',
  {'url': 'https://www.website.com/images/image-1.jpg'},
  255],

  ['https://www.website.com/directory/link-2',
  296,
  'Long Text Field 2',
  {'url': 'https://www.website.com/images/image-2.jpg'},
  303],

  ['https://www.website.com/directory/link-3',
  354,
  'Long Text Field 3',
  {'url': 'https://www.website.com/images/image-3.jpg'},
  388],

  ['https://www.website.com/directory/link-4',
  606,
  'Long Text Field 4',
  {'url': 'https://www.website.com/images/image-4.jpg'},
  624]
]
columns = ['Page', 'Count', 'Text', 'String', 'Url', 'Magic']

for d in data:
    if len(d) != 6:
        d.insert(3, None)
    d[4] = d[4]['url']
df = pd.DataFrame(data, columns=columns)


agg = dict.fromkeys(columns, 'first')
agg.update({'Count': 'sum'})
del agg['Page']
df2 = df.groupby(['Page'], as_index=False).agg(agg)

pd.options.display.width = 0
print df
print '\n----\n'
print df2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何合并列表中具有相等值的字典并连接不相等的值并将其他字段保留在字典中? - How do I merge dictionaries in list that have equal value and concate values that are not equal and keep other field in dict? 如何按键汇总字典列表? - How do I sum a list of dictionaries by key? 如何将字典列表添加到另一个字典列表中? - How do I add a list of dictionaries to another list of dictionaries? 如何根据字典列字段列表中的键值对过滤 DataFrame 行? - How do I filter DataFrame rows based on a key value pair from a list of dictionaries column field? 我应该如何在 Marshmallow Python 中添加包含字典列表的字段? - How should I add a field containing a list of dictionaries in Marshmallow Python? 根据另一个字段值将字段添加到字典列表 - Adding a field to a list of dictionaries based on another field value 如何合并字典列表? - How to merge list of dictionaries? 如何在 Python 中将字典合并在一起? - How do I merge dictionaries together in Python? Robot Framework:将字典的值与字典列表中的匹配项合并和求和 - Robot Framework: Merge and sum values of dictionaries with matching items in a list of dictionaries 如何在Django中处理自定义输入字段? - How do I process a custom input field in Django?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM