简体   繁体   English

df的列表字典

[英]Dict of lists to df

I have a dict of lists like so: 我有一个像这样的列表:

    {291840: ['http://www.rollanet.org', 'http://www.rollanet.org'], 
     291841: ['http://www.superpages.com', 'http://www.superpages.com], 
     291848: ['http://www.drscore.com/App/ScoreDr', 'http://www.drscore.com'],...etc }

I want to convert it to a two column dataframe, one for the subj_id and the other for the corresponding list. 我想将它转换为两列数据帧,一个用于subj_id ,另一个用于相应的列表。 Each row will be a key of the dict and the column be the value(list) using from_dict with orient set to index. 每行都是dict的一个键,列是使用from_dict并将orient设置为index的值(列表)。 According to documentation: "orient: if the keys should be rows, pass 'index'." 根据文档:“orient:如果键应该是行,则传递'index'。”

names = ['subj_id', 'URLs']

dfDict = pd.DataFrame(columns = names)
dfDict.from_dict(listDict, orient = 'index')

Instead I get a dataframe that has each element of the lists as a column. 相反,我得到一个数据框,其中列表的每个元素都作为列。 I only want two columns. 我只想要两列。 One for the subj_ID and the other for the lists of URLs associated with subj_ID . 一个用于subj_ID ,另一个用于与subj_ID关联的URL列表。

I think you need: 我想你需要:

listDict = {291840: ['http://www.rollanet.org', 'http://www.rollanet.org'], 
     291841: ['http://www.superpages.com', 'http://www.superpages.com'], 
     291848: ['http://www.drscore.com/App/ScoreDr', 'http://www.drscore.com']}

names = ['subj_id', 'URLs']

df = pd.DataFrame(listDict).stack().reset_index(drop=True, level=0).reset_index()
df.columns = names
print (df)
   subj_id                                URLs
0   291840             http://www.rollanet.org
1   291841           http://www.superpages.com
2   291848  http://www.drscore.com/App/ScoreDr
3   291840             http://www.rollanet.org
4   291841           http://www.superpages.com
5   291848              http://www.drscore.com

Old answer: 老答案:

df = pd.DataFrame.from_dict(listDict, orient='index').stack().reset_index(drop=True, level=1)

If need list in column URLs use list comprehensions : 如果列URLs需要列表使用list comprehensions

df = pd.DataFrame({'subj_id': pd.Series([k for k,v in listDict.items()]),
                   'URLs': pd.Series([v for k,v in listDict.items()])}, columns = names)
print (df)
   subj_id                                               URLs
0   291840  [http://www.rollanet.org, http://www.rollanet....
1   291841  [http://www.superpages.com, http://www.superpa...
2   291848  [http://www.drscore.com/App/ScoreDr, http://ww...

since I'm too late to give jezrael's answer, here's an interesting way to do it: 因为我来不及给予以色列的回答,这是一个有趣的方法:

pd.concat([pd.Series(v, [k] * len(v)) for k, v in listDict.items()]) \
    .rename_axis('subj_id').reset_index(name='urls')

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM