繁体   English   中英

Python 3使用Counter返回带有相关用户ID的值的计数

[英]Python 3 Using Counter to return a count of values with their associated User IDs

想知道是否有人可以帮助我(这是课程作业的一小部分)。

我试图返回每个UserID(visitor_uuid)及其相应的总时间读数(event_readtime)。

数据来自已经解析的JSON文件,到目前为止我的代码如下:

def time_reading(data):
    time = Counter()
    users = Counter()

    for d in data:
        #users[d["visitor_uuid"]] += time[d["event_readtime"]]
        if "event_readtime" in d:

            #Increment the visitor_country in the counter
            final = ([d["visitor_uuid"]],[sum(d["event_readtime"]]))
            #time[d["event_readtime"]] += 1





    return (final)

我的JSON的示例是:

{“ ts”:1393631989,“ visitor_uuid”:“ 64bf70296wa2f9fd”,
“ visitor_username”:null,“ visitor_source”:“ internal”,
“ visitor_device”:“浏览器”,“ visitor_useragent”:“ Mozilla / 5.0(Windows NT 6.1; WOW64; rv:27.0)Gecko / 20100101 Firefox / 27.0”,
“ visitor_ip”:“ 06f49269e749a837”,“ visitor_country”:“ VE”,
“ visitor_referrer”:“ 64f729926497515c”,“ env_type”:“读者”,
“ env_doc_id”:“ 130705172251-3a2a725b2bbd5aa3f2af810acf0aeabb”,
“ env_adid”:null,“ event_type”:“ pagereadtime”,
“ event_readtime”:5,“ subject_type”:“ doc”,“ subject_doc_id”:“ 130705172251-3a2a725b2bbd5aa3f2af810acf0aeabb”,“ subject_page”:10,“ cause”:null} {“ ts”:1393631989,“ visitor_uuid”“:” 64bfd70296“ ,“ visitor_username”:null,“ visitor_source”:“内部”,“ visitor_device”:“浏览器”,“ visitor_useragent”:“ Mozilla / 5.0(Windows NT 6.1; WOW64; rv:27.0)Gecko / 20100101 Firefox / 27.0'', “ visitor_ip”:“ 06f49269e749a837”,
“ visitor_country”:“ VE”,“ visitor_referrer”:“ 64f729926497515c”,
“ env_type”:“阅读器”,“ env_doc_id”:“ 130705172251-3a2a725b2bbd5aa3f2af810acf0aeabb”,“ env_adid”:null,“ event_type”:“ pagereadtime”,“ event_readtime”:2,
“ subject_type”:“ doc”,“ subject_doc_id”:“ 130705172251-3a2a725b2bbd5aa3f2af810acf0aeabb”,“ subject_page”:10,“ cause”:null}

输出将是:

(64bf70296wa2f9fd, 7)

任何帮助将不胜感激!

你快到了 这是一个经典的groupby问题,您的用户有多个事件。 您在这里不使用计数器,而是使用groupby。

配料

>>> import random
>>> import operator as op
>>> from itertools import groupby
>>> import uuid

该代码段为我提供了与您相似的数据

>>> data = [{'event_readtime': random.randint(0, 10), 'visitor_uuid': str(uuid.uuid4())} for _ in range(5)]
>>> data
[{'visitor_uuid': 'f6b55181-d3c9-4699-9fac-506b8cc6871c', 'event_readtime': 0}, {'visitor_uuid': '54a24574-1e83-45ab-b4bb-41a0ead3d244', 'event_readtime': 10}, {'visitor_uuid': '3a2fffe9-c173-4afa-b93e-a60b04f043cb', 'event_readtime': 10}, {'visitor_uuid': 'b008557d-c5a1-404a-ba5b-2ec57340dee7', 'event_readtime': 4}, {'visitor_uuid': '6bb94313-4d5e-4339-b3b0-93e9733b4a88', 'event_readtime': 4}]

groupby仅对相邻数据进行分组,因此您需要按键对它们进行排序。

>>> data.sort(key=op.itemgetter('visitor_uuid'))

然后在数据集上应用分组依据

>>> [(k, sum(map(op.itemgetter('event_readtime'), v))) for k, v in groupby(data, op.itemgetter('visitor_uuid'))]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM