通过两个文件中的一列中的值与另一列中的相应值进行汇总

Question

有一个关于将重复键的多个值加到一个键与总合中的问题。 例如：1：5 2：4 3：2 1：4非常基本，但是我正在寻找类似以下的输出：1：9 2：4 3：2

在我正在使用的两个文件中，我处理的是一个列表，其中有51个用户（user_artists.dat的第1列）具有artistID（第2列），以及该用户听过该权值给定的特定艺术家的次数（第3列）。

我正在尝试汇总所有用户上表演该艺术家的总时间，并以以下格式显示它：Britney Spears（289）2393140。我们将不胜感激任何帮助或投入。

import codecs
#from collections import defaultdict

with codecs.open("artists.dat", encoding = "utf-8") as f:
    artists = f.readlines()


with codecs.open("user_artists.dat", encoding = "utf-8") as f:
    users = f.readlines()


artist_list = [x.strip().split('\t') for x in artists][1:]
user_stats_list = [x.strip().split('\t') for x in users][1:]

artists = {}
for a in artist_list:
    artistID, name = a[0], a[1]
    artists[artistID] = name

grouped_user_stats = {}
for u in user_stats_list:
    userID, artistID, weight = u
    grouped_user_stats[artistID] = grouped_user_stats[artistID].astype(int)
    grouped_user_stats[weight] = grouped_user_stats[weight].astype(int)
    for artistID, weight in u:
        grouped_user_stats.groupby('artistID')['weight'].sum()
        print(grouped_user_stats.groupby('artistID')['weight'].sum())



    #if userID not in grouped_user_stats:
        #grouped_user_stats[userID] = { artistID: {'name': artists[artistID], 'plays': 1} }
    #else:
        #if artistID not in grouped_user_stats[userID]:
            #grouped_user_stats[userID][artistID] = {'name': artists[artistID], 'plays': 1}
        #else:
            #grouped_user_stats[userID][artistID]['plays'] += 1
            #print('this never happens') 




#print(grouped_user_stats)

Answer 1

怎么样：

import codecs
from collections import defaultdict
# read stuff
with codecs.open("artists.dat", encoding = "utf-8") as f:
    artists = f.readlines()
with codecs.open("user_artists.dat", encoding = "utf-8") as f:
    users = f.readlines()
# transform artist data in a dict with "artist id" as key and "artist name" as value
artist_repo = dict(x.strip().split('\t')[:2] for x in artists[1:])

user_stats_list = [x.strip().split('\t') for x in users][1:]

grouped_user_stats = defaultdict(lambda:0)

for u in user_stats_list:
    #userID, artistID, weight = u
    grouped_user_stats[u[0]] += int(u[2]) # accumulate weights in a dict with artist id as key and sum of wights as values
# extra: "fancying" the data transforming the keys of the dict in "<artist name> (artist id)" format 
grouped_user_stats = dict(("%s (%s)" % (artist_repo.get(k,"Unknown artist"), k), v) for k ,v in grouped_user_stats.iteritems() )
# lastly print it
for k, v in grouped_user_stats.iteritems():
   print k,v

通过两个文件中的一列中的值与另一列中的相应值进行汇总

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-04-12 22:26:43

通过两个文件中的一列中的值与另一列中的相应值进行汇总

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-04-12 22:26:43

解决方案1
0 已采纳 2017-04-12 22:26:43