I have a dictionary which the keys are years while the values are corresponding models. Below is a piece of data I printed out from the dictionary.
1975: ['MODEL9808533471'],
1985: ['MODEL0912768548'],
1980: ['MODEL1006230072', 'MODEL7898438988'],
1987: ['MODEL0848444339'],
1977: ['MODEL7889395724'],
1962: ['MODEL8686121468'],
1965: ['MODEL0911532520'],
2018: ['MODEL1712050002', 'MODEL1712050003', 'MODEL1712050004']
What I want to have like the following:
1962 1965 1975 1977 1980 1985 1987 2018
MODEL9808533471 1
MODEL0912768548 1
MODEL1006230072 1
MODEL7898438988 1
MODEL0848444339 1
MODEL7889395724 1
MODEL8686121468 1
MODEL0911532520 1
MODEL1712050002 1
MODEL1712050003 1
MODEL1712050004 1
In the beginning, I think we need to loop each value of the dictionary and build the matrix. Then pandas will output to a csv file.
I cannot find a similar idea in numpy package though it is strong for manipulating matrices. I've found this link in our forum but the length of lists are identical.
Do you know any tools or facilities (for example functionality in pandas, numpy or something along those lines) that help me do so?
Thanks!
Perfectly fit the usage of MultiLabelBinarizer
from sklearn
from sklearn.preprocessing import MultiLabelBinarizer
s = pd.Series(d)
mlb = MultiLabelBinarizer()
yourdf=pd.DataFrame(mlb.fit_transform(s),columns=mlb.classes_, index=s.index).T
yourdf
Out[121]:
1975 1985 1980 1987 1977 1962 1965 2018
MODEL0848444339 0 0 0 1 0 0 0 0
MODEL0911532520 0 0 0 0 0 0 1 0
MODEL0912768548 0 1 0 0 0 0 0 0
MODEL1006230072 0 0 1 0 0 0 0 0
MODEL1712050002 0 0 0 0 0 0 0 1
MODEL1712050003 0 0 0 0 0 0 0 1
MODEL1712050004 0 0 0 0 0 0 0 1
MODEL7889395724 0 0 0 0 1 0 0 0
MODEL7898438988 0 0 1 0 0 0 0 0
MODEL8686121468 0 0 0 0 0 1 0 0
MODEL9808533471 1 0 0 0 0 0 0 0
Or get_dummies
s.apply(','.join).str.get_dummies(',').T
Out[127]:
1975 1985 1980 1987 1977 1962 1965 2018
MODEL0848444339 0 0 0 1 0 0 0 0
MODEL0911532520 0 0 0 0 0 0 1 0
MODEL0912768548 0 1 0 0 0 0 0 0
MODEL1006230072 0 0 1 0 0 0 0 0
MODEL1712050002 0 0 0 0 0 0 0 1
MODEL1712050003 0 0 0 0 0 0 0 1
MODEL1712050004 0 0 0 0 0 0 0 1
MODEL7889395724 0 0 0 0 1 0 0 0
MODEL7898438988 0 0 1 0 0 0 0 0
MODEL8686121468 0 0 0 0 0 1 0 0
MODEL9808533471 1 0 0 0 0 0 0 0
Assuming d
is your dictionary,
df = pd.DataFrame(d.values(), index=d.keys()).stack().reset_index(level=0)
df.columns = ['year', 'col']
pd.crosstab(df['col'], df['year'])
year 1962 1965 1975 1977 1980 1985 1987 2018
col
MODEL0848444339 0 0 0 0 0 0 1 0
MODEL0911532520 0 1 0 0 0 0 0 0
MODEL0912768548 0 0 0 0 0 1 0 0
MODEL1006230072 0 0 0 0 1 0 0 0
MODEL1712050002 0 0 0 0 0 0 0 1
MODEL1712050003 0 0 0 0 0 0 0 1
MODEL1712050004 0 0 0 0 0 0 0 1
MODEL7889395724 0 0 0 1 0 0 0 0
MODEL7898438988 0 0 0 0 1 0 0 0
MODEL8686121468 1 0 0 0 0 0 0 0
MODEL9808533471 0 0 1 0 0 0 0 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.