将Python列表分组在一起以获取公共元素

Question

I'm querying Google Analytics data for sessions and users for each different country. 我正在查询Google Analytics（分析）数据中每个国家/地区的会话和用户。 I want to save this data in my db for each single day so I can access it later on. 我想每天将这些数据保存在数据库中，以便以后可以访问。

My query gives me a really big json back and I'm trying to find the optima solution to maximise speed. 我的查询给了我一个很大的json，我试图找到最佳解决方案以最大化速度。

First of all I managed to get back the data ordered by sessions, which means that I can now save only the first 10 countries in my db without saving for each day a new row for each country. 首先，我设法取回了按会话排序的数据，这意味着我现在只能在数据库中保存前10个国家，而不必每天为每个国家/地区保存新行。

I think this is the minimum amount of data I need in order to have valuable info. 我认为这是获得有价值信息所需的最少数据量。 So now I structured my bd to accept data like this: 所以现在我将bd结构化为接受如下数据：

20170101 | US | 112 (sessions) | 111 (users)
20170101 | CA | 111 (sessions) | 221 (users)
... (for 8 more rows)
20170102 | US | 11 (sessions) | 22 (users)
... (and so on, so 10 rows per day)

Now my big json that I get back looks something like this (I've removed a lot of metrics in the middle): 现在我返回的大json看起来像这样（我在中间删除了很多指标）：

m = {
'reports': [{
    'data': {
        'rowCount': 2003,
        'maximums': [{
            'values': ['1219', '1109']
        }],
        'minimums': [{
            'values': ['1', '1']
        }],
        'totals': [{
            'values': ['33505', '30382']
        }],
        'rows': [{
            'dimensions': ['20170404', 'US'],
            'metrics': [{
                'values': ['1219', '1091']
            }]
        }, {
            'dimensions': ['20170406', 'US'],
            'metrics': [{
                'values': ['1203', '1109']
            }]
        }, {
            'dimensions': ['20170405', 'US'],
            'metrics': [{
                'values': ['1185', '1073']
            }]
        }, {
            'dimensions': ['20170408', 'PL'],
            'metrics': [{
                'values': ['2', '1']
            }]
        }, {
            'dimensions': ['20170408', 'SG'],
            'metrics': [{
                'values': ['2', '2']
            }]
        }, {
            'dimensions': ['20170408', 'TT'],
            'metrics': [{
                'values': ['2', '2']
            }]
        }]
    },
    'nextPageToken': '1000',
    'columnHeader': {
        'dimensions': ['ga:date', 'ga:countryIsoCode'],
        'metricHeader': {
            'metricHeaderEntries': [{
                'name': 'ga:sessions',
                'type': 'INTEGER'
            }, {
                'name': 'ga:users',
                'type': 'INTEGER'
            }]
        }
    }
}]
}

I'm trying to figure out how I can extract the top 10 countries with most sessions for each day and save this info in my db, so far I came up with: 我想弄清楚如何提取每天最多会话的前10个国家/地区并将此信息保存在数据库中，到目前为止，我想到了：

x = m['reports'][0]['data']['rows']

l =[]
for data in x:
    date = data['dimensions'][0]
    country = data['dimensions'][1]
    sessions = data['metrics'][0]['values'][0]
    users = data['metrics'][0]['values'][1]
    n = [date, [country,sessions, users]]
    l.append(n)

This generates me a list with inside values in the format [date[country, sessions, users]] 这会为我生成一个列表，其中的内部值格式为[date[country, sessions, users]]

so something like this: 所以像这样：

[['20170404', ['US', '1219', '1091']],
 ['20170406', ['US', '1203', '1109']],
 ['20170405', ['US', '1185', '1073']],
 ['20170408', ['PL', '2', '1']],
 ['20170408', ['SG', '2', '2']],
 ['20170408', ['TT', '2', '2']]]

Now I was thinking to nest an other for loop which checks the date and if it's the same it will add the values z[1] to the same list, so for every date I would have a list with the values for each single country. 现在，我正在考虑嵌套另一个用于检查日期的for循环，如果该循环相同，则会将值z[1]到同一列表中，因此对于每个日期，我都会有一个包含每个国家/地区值的列表。 However I'm not sure how to group these dictionary together according the the first value z[0] plus this would do all the countries and not only the top 10 ones. 但是，我不确定如何根据第一个值z[0]将这些字典分组在一起，而且这将适用于所有国家，而不仅是前十个国家。

Is there an easier way to accomplish this given the big json above? 给定上面的大json，是否有更简单的方法来完成此操作？ If how do I group lists together according to the first value and how I then sort by sessions? 如果我如何根据第一个值将列表分组在一起，然后如何按会话排序？

Thanks! 谢谢！

Answer 1

When there are no duplicate countries per day. 每天没有重复的国家/地区。 You could use defaultdicts , to mange the different levels of grouping (magically): 您可以使用defaultdicts来管理不同级别的分组（神奇地）：

import pprint
from collections import defaultdict

def recursive_defaultdict():
    return defaultdict(recursive_defaultdict)

l = recursive_defaultdict()

x = m['reports'][0]['data']['rows']

for data in x:
    date = data['dimensions'][0]
    country = data['dimensions'][1]
    sessions = data['metrics'][0]['values'][0]
    users = data['metrics'][0]['values'][1]

    l[date][country] = {'sessions': sessions, 'users': users}

pprint.pprint(l)

This returns a dict, that allows you to easily iterate over: 这将返回一个dict，使您可以轻松地迭代：

defaultdict(<function recursive_defaultdict at 0x7f3ecfb45e18>,
            {'20170404': defaultdict(<function recursive_defaultdict at 0x7f3ecfb45e18>,
                                     {'US': {'sessions': '1219',
                                             'users': '1091'}}),
             '20170405': defaultdict(<function recursive_defaultdict at 0x7f3ecfb45e18>,
                                     {'US': {'sessions': '1185',
                                             'users': '1073'}}),
             '20170406': defaultdict(<function recursive_defaultdict at 0x7f3ecfb45e18>,
                                     {'US': {'sessions': '1203',
                                             'users': '1109'}}),
             '20170408': defaultdict(<function recursive_defaultdict at 0x7f3ecfb45e18>,
                                     {'PL': {'sessions': '2', 'users': '1'},
                                      'SG': {'sessions': '2', 'users': '2'},
                                      'TT': {'sessions': '2', 'users': '2'}})})

To receive a specific combination of date/country: 接收日期/国家/地区的特定组合：

print (l['20170404']['US'])
>>> {'sessions': '1219', 'users': '1091'}

Iterate through result: 遍历结果：

for date, values in l.items():
    for country, value in values.items():
        print (date, country, value)

将Python列表分组在一起以获取公共元素

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-04-25 20:14:45

将Python列表分组在一起以获取公共元素

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-04-25 20:14:45

解决方案1
1 已采纳 2017-04-25 20:14:45