I have a problem with a Dataframe looking like this:
It contains "ClusterLabels" (0-44) and I want to group the "Document" col by the ClusterLabel value. I want These lists from "Document" to be combined in one list per Cluster. (duplicate words sould be kept)
Tryed the ".groupby" argument but it gives the error "sequence item 0: expected str instance, list found".
Can someone help?
Don't use sum to concatenate lists. It looks fancy but it's quadratic and should be considered bad practice.
Better is use list comprehension with flatten lists:
df1 = (df.groupby('ClusterLabel')['Document']
.agg(lambda x: [z for y in x for z in y])
.reset_index())
Or flatten in itertools.chain
:
from itertools import chain
df1 = (df.groupby('ClusterLabel')['Document']
.agg(lambda x: list(chain(*x)))
.reset_index())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.