df的列表字典

Question

I have a dict of lists like so: 我有一个像这样的列表：

    {291840: ['http://www.rollanet.org', 'http://www.rollanet.org'], 
     291841: ['http://www.superpages.com', 'http://www.superpages.com], 
     291848: ['http://www.drscore.com/App/ScoreDr', 'http://www.drscore.com'],...etc }

I want to convert it to a two column dataframe, one for the subj_id and the other for the corresponding list. 我想将它转换为两列数据帧，一个用于subj_id ，另一个用于相应的列表。 Each row will be a key of the dict and the column be the value(list) using from_dict with orient set to index. 每行都是dict的一个键，列是使用from_dict并将orient设置为index的值（列表）。 According to documentation: "orient: if the keys should be rows, pass 'index'." 根据文档：“orient：如果键应该是行，则传递'index'。”

names = ['subj_id', 'URLs']

dfDict = pd.DataFrame(columns = names)
dfDict.from_dict(listDict, orient = 'index')

Instead I get a dataframe that has each element of the lists as a column. 相反，我得到一个数据框，其中列表的每个元素都作为列。 I only want two columns. 我只想要两列。 One for the subj_ID and the other for the lists of URLs associated with subj_ID . 一个用于subj_ID ，另一个用于与subj_ID关联的URL列表。

Answer 1

I think you need: 我想你需要：

listDict = {291840: ['http://www.rollanet.org', 'http://www.rollanet.org'], 
     291841: ['http://www.superpages.com', 'http://www.superpages.com'], 
     291848: ['http://www.drscore.com/App/ScoreDr', 'http://www.drscore.com']}

names = ['subj_id', 'URLs']

df = pd.DataFrame(listDict).stack().reset_index(drop=True, level=0).reset_index()
df.columns = names
print (df)
   subj_id                                URLs
0   291840             http://www.rollanet.org
1   291841           http://www.superpages.com
2   291848  http://www.drscore.com/App/ScoreDr
3   291840             http://www.rollanet.org
4   291841           http://www.superpages.com
5   291848              http://www.drscore.com

Old answer: 老答案：

df = pd.DataFrame.from_dict(listDict, orient='index').stack().reset_index(drop=True, level=1)

If need list in column URLs use list comprehensions : 如果列URLs需要列表使用list comprehensions ：

df = pd.DataFrame({'subj_id': pd.Series([k for k,v in listDict.items()]),
                   'URLs': pd.Series([v for k,v in listDict.items()])}, columns = names)
print (df)
   subj_id                                               URLs
0   291840  [http://www.rollanet.org, http://www.rollanet....
1   291841  [http://www.superpages.com, http://www.superpa...
2   291848  [http://www.drscore.com/App/ScoreDr, http://ww...

Answer 2

since I'm too late to give jezrael's answer, here's an interesting way to do it: 因为我来不及给予以色列的回答，这是一个有趣的方法：

pd.concat([pd.Series(v, [k] * len(v)) for k, v in listDict.items()]) \
    .rename_axis('subj_id').reset_index(name='urls')

df的列表字典

问题描述

2 个解决方案

解决方案1
5 已采纳 2016-11-16 07:21:37

解决方案2
4 2016-11-16 07:46:01

df的列表字典

问题描述

2 个解决方案

解决方案1 5 已采纳 2016-11-16 07:21:37

解决方案2 4 2016-11-16 07:46:01

解决方案1
5 已采纳 2016-11-16 07:21:37

解决方案2
4 2016-11-16 07:46:01