I have a python dictionary with nested lists, that I would like to turn into a pandas DataFrame
a = {'A': [1,2,3], 'B':['a','b','c'],'C':[[1,2],[3,4],[5,6]]}
I would like the final DataFrame to look like this:
> A B C
> 1 a 1
> 1 a 2
> 2 b 3
> 2 b 4
> 3 c 5
> 3 c 6
When I use the DataFrame command it looks like this:
pd.DataFrame(a)
> A B C
>0 1 a [1, 2]
>1 2 b [3, 4]
>2 3 c [5, 6]
Is there anyway I make the data long by the elements of C?
This is what I came up with:
In [53]: df
Out[53]:
A B C
0 1 a [1, 2]
1 2 b [3, 4]
2 3 c [5, 6]
In [58]: s = df.C.apply(Series).unstack().reset_index(level=0, drop = True)
In [59]: s.name = 'C2'
In [61]: df.drop('C', axis = 1).join(s)
Out[61]:
A B C2
0 1 a 1
0 1 a 2
1 2 b 3
1 2 b 4
2 3 c 5
2 3 c 6
apply(Series)
gives me a DataFrame with two columns. To join them into one while keeping the original index, I use unstack
. reset_index
removes the first level of the index, which basically holds the index of the value in the original list which was in C. Then I join it back into the df.
Yes, one way is to deal with your dictionnary first ( I assume your dictionnary values contain either just list of values either list of nested lists - but not lists of both values and lists). Step by step:
def f(x, y): return x + y
res={k: reduce(f, v) if any(isinstance(i, list) for i in v) else v for k,v in a.items()}
will give you: {'A': [1, 2, 3], 'C': [1, 2, 3, 4, 5, 6], 'B': ['a', 'b', 'c']}
Now you need to extend lists in your dictionnary:
m = max([len(v) for v in res.values()])
res1 = {k: reduce(f, [(m/len(v))*[i] for i in v]) for k,v in res.items()}
And finally:
pd.DataFrame(res1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.