I have a dataframe that look like below:
df=pd.DataFrame([{"id": 'A1', 'words': 'a,b,d,d,e,f,f'},
{"id": 'A2', 'words': 'm,b,t,d,e,t,s'},
{"id": 'A3', 'words': 's,b,d,e,e,m,m'}])
Note that if a letter appears twice, it should only be counted once. How do I apply get_dummies()
to turn it into the below final data frame?
id a b d e f m s t
A1 1 1 1 1 1 0 0 0
A2 0 1 1 1 0 1 1 1
A3 0 1 1 1 0 1 1 0
I used the below code but it did not work quite as expected, likely due to the duplicated values in the column.
df = df.assign(words = df.words.str.split(',')).explode('words')
df = pd.get_dummies(df, prefix=['words'], columns=['words'])
df
Let us try
out = df.set_index('id')['words'].str.get_dummies(',').reset_index()
Out[171]:
id a b d e f m s t
0 A1 1 1 1 1 1 0 0 0
1 A2 0 1 1 1 0 1 1 1
2 A3 0 1 1 1 0 1 1 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.