简体   繁体   中英

Convert a dictionary which values are different-length lists into a dataframe

I have a dictionary which the keys are years while the values are corresponding models. Below is a piece of data I printed out from the dictionary.

1975: ['MODEL9808533471'], 
1985: ['MODEL0912768548'], 
1980: ['MODEL1006230072', 'MODEL7898438988'], 
1987: ['MODEL0848444339'], 
1977: ['MODEL7889395724'], 
1962: ['MODEL8686121468'], 
1965: ['MODEL0911532520'],  
2018: ['MODEL1712050002', 'MODEL1712050003', 'MODEL1712050004']

What I want to have like the following:

                 1962    1965    1975   1977   1980   1985  1987  2018
MODEL9808533471                    1
MODEL0912768548                                         1
MODEL1006230072                                  1
MODEL7898438988                                  1
MODEL0848444339                                               1
MODEL7889395724                           1
MODEL8686121468   1
MODEL0911532520            1
MODEL1712050002                                                     1
MODEL1712050003                                                     1
MODEL1712050004                                                     1

In the beginning, I think we need to loop each value of the dictionary and build the matrix. Then pandas will output to a csv file.
I cannot find a similar idea in numpy package though it is strong for manipulating matrices. I've found this link in our forum but the length of lists are identical.

Do you know any tools or facilities (for example functionality in pandas, numpy or something along those lines) that help me do so?

Thanks!

Perfectly fit the usage of MultiLabelBinarizer from sklearn

from sklearn.preprocessing import MultiLabelBinarizer
s = pd.Series(d)
mlb = MultiLabelBinarizer()
yourdf=pd.DataFrame(mlb.fit_transform(s),columns=mlb.classes_, index=s.index).T
yourdf
Out[121]: 
                 1975  1985  1980  1987  1977  1962  1965  2018
MODEL0848444339     0     0     0     1     0     0     0     0
MODEL0911532520     0     0     0     0     0     0     1     0
MODEL0912768548     0     1     0     0     0     0     0     0
MODEL1006230072     0     0     1     0     0     0     0     0
MODEL1712050002     0     0     0     0     0     0     0     1
MODEL1712050003     0     0     0     0     0     0     0     1
MODEL1712050004     0     0     0     0     0     0     0     1
MODEL7889395724     0     0     0     0     1     0     0     0
MODEL7898438988     0     0     1     0     0     0     0     0
MODEL8686121468     0     0     0     0     0     1     0     0
MODEL9808533471     1     0     0     0     0     0     0     0

Or get_dummies

s.apply(','.join).str.get_dummies(',').T
Out[127]: 
                 1975  1985  1980  1987  1977  1962  1965  2018
MODEL0848444339     0     0     0     1     0     0     0     0
MODEL0911532520     0     0     0     0     0     0     1     0
MODEL0912768548     0     1     0     0     0     0     0     0
MODEL1006230072     0     0     1     0     0     0     0     0
MODEL1712050002     0     0     0     0     0     0     0     1
MODEL1712050003     0     0     0     0     0     0     0     1
MODEL1712050004     0     0     0     0     0     0     0     1
MODEL7889395724     0     0     0     0     1     0     0     0
MODEL7898438988     0     0     1     0     0     0     0     0
MODEL8686121468     0     0     0     0     0     1     0     0
MODEL9808533471     1     0     0     0     0     0     0     0

You can stack and crosstab

Assuming d is your dictionary,

df = pd.DataFrame(d.values(), index=d.keys()).stack().reset_index(level=0)

df.columns = ['year', 'col']

pd.crosstab(df['col'], df['year'])


year            1962    1965    1975    1977    1980    1985    1987    2018
col                             
MODEL0848444339 0       0       0       0       0       0       1       0
MODEL0911532520 0       1       0       0       0       0       0       0
MODEL0912768548 0       0       0       0       0       1       0       0
MODEL1006230072 0       0       0       0       1       0       0       0
MODEL1712050002 0       0       0       0       0       0       0       1
MODEL1712050003 0       0       0       0       0       0       0       1
MODEL1712050004 0       0       0       0       0       0       0       1
MODEL7889395724 0       0       0       1       0       0       0       0
MODEL7898438988 0       0       0       0       1       0       0       0
MODEL8686121468 1       0       0       0       0       0       0       0
MODEL9808533471 0       0       1       0       0       0       0       0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM