简体   繁体   中英

How to convert dictionary which as values as list of lists into dataframe in python?

I have a dictionary like this., keys as 'Start postions' and values as list of entries, each entry contains multiple other values.

dict1 = {28878779: 
[[0.63078648931418,'BRCA','Primary Blood Derived Cancer','chr16'],
  [0.913319324289701, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.4291909025802871, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.7571498628201009, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.20053355013001398, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.47222708511173905, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.5421979810611359, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.517080694962231, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.354578922865826, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.47933127476003706, 'BRCA', 'Primary Blood Derived Cancer', 'chr16']]
116276795: 
[[0.0295335249313507,'BRCA','Primary Blood Derived Cancer','chr12'],
  [0.0225709542480921, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0230930552162406, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0226794373583645, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0465238706721383, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0308525159082739, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0280263565564701, 'BRCA', 'Primary Blood Derived Cancer', 'chr12']]
...}

I want to convert the dictionary into dataframe like this., A dataframe which contains dictionary's keys and values (each entry of the values) into rows of dataframe.

Start       Beta_value       Cancer            Stage             Chromosome
28878779  0.63078648931418   BRCA  Primary Blood Derived Cancer    chr16
28878779  0.913319324289701  BRCA  Primary Blood Derived Cancer    chr16
.
.
116276795 0.029533524931350  BRCA  Primary Blood Derived Cancer    chr12
116276795 0.0225709542480921 BRCA  Primary Blood Derived Cancer    chr12
.
.

I tried this..

dlist = [[key,value[i][0],value[i][1],value[i][2],value[i][3]]
for key,value in dict1.items()
for i in value]


beta = pd.DataFrame(d, columns = 
['Start','Beta_value','Cancer','Stage','Chromosome'])

It is showing some type error:

   TypeError: list indices must be integers or slices, not list

what am I supposed to do?

Variable i return lists, so need indexing them:

dlist = [[key,i[0],i[1],i[2],i[3]] for key,value in dict1.items() for i in value]

Or add key to list:

dlist = [[key] + i for key,value in dict1.items() for i in value] 
#alternative 
#dlist = [(key, *i) for key,value in dict1.items() for i in value]    

beta = pd.DataFrame(dlist, columns=['Start','Beta_value','Cancer','Stage','Chromosome'])
print (beta)
        Start  Beta_value Cancer                         Stage Chromosome
0    28878779    0.630786   BRCA  Primary Blood Derived Cancer      chr16
1    28878779    0.913319   BRCA  Primary Blood Derived Cancer      chr16
2    28878779    0.429191   BRCA  Primary Blood Derived Cancer      chr16
3    28878779    0.757150   BRCA  Primary Blood Derived Cancer      chr16
4    28878779    0.200534   BRCA  Primary Blood Derived Cancer      chr16
5    28878779    0.472227   BRCA  Primary Blood Derived Cancer      chr16
6    28878779    0.542198   BRCA  Primary Blood Derived Cancer      chr16
7    28878779    0.517081   BRCA  Primary Blood Derived Cancer      chr16
8    28878779    0.354579   BRCA  Primary Blood Derived Cancer      chr16
9    28878779    0.479331   BRCA  Primary Blood Derived Cancer      chr16
10  116276795    0.029534   BRCA  Primary Blood Derived Cancer      chr12
11  116276795    0.022571   BRCA  Primary Blood Derived Cancer      chr12
12  116276795    0.023093   BRCA  Primary Blood Derived Cancer      chr12
13  116276795    0.022679   BRCA  Primary Blood Derived Cancer      chr12
14  116276795    0.046524   BRCA  Primary Blood Derived Cancer      chr12
15  116276795    0.030853   BRCA  Primary Blood Derived Cancer      chr12
16  116276795    0.028026   BRCA  Primary Blood Derived Cancer      chr12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM