I'd like to efficiently create a pandas DataFrame from a Python collections.Counter dictionary .. but there's an additional requirement.
The Counter dictionary looks like this:
(a, b) : 5
(c, d) : 7
(a, d) : 2
Those dictionary keys are tuples where the first is to become the row, and the second the column of the dataframe.
The resulting DataFrame should look like this:
b d
a 5 2
c 0 7
For larger data I don't want to create a dataframe using the growth method df[a][b]= 5
etc as that is incredibly inefficient as it creates a copy of the new dataframe every time such an extension is done (I'm let to believe).
Perhaps the right answer is to go via a numpy array ?
I would create a Series
using MultiIndex.from_tuples
and then unstack
it.
keys, values = zip(*counter.items())
idx = pd.MultiIndex.from_tuples(keys)
pd.Series(values, index=idx).unstack(-1, fill_value=0)
b d
a 5 2
c 0 7
Using DataFrame
constructor with stack
:
pd.DataFrame(counter, index=[0]).stack().loc[0].T
b d
a 5.0 2.0
c NaN 7.0
Using Series
with unstack
pd.Series(d).unstack(fill_value=0)
Out[708]:
b d
a 5 2
c 0 7
Input data
d={('a', 'b') : 5,
('c', 'd') : 7,
('a', 'd') : 2}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.