I have a text dataset that looks like this.
import pandas as pd
df = pd.DataFrame({'Sentence': ['Hello World',
'The quick brown fox jumps over the lazy dog.',
'Just some text to make third sentence!'
],
'label': ['greetings',
'dog,fox',
'some_class,someother_class'
]})
I want to transform this data into something like this.
Is there a pythonic way to make this transformation for multilabel classification?
You can use pandas.Series.explode
to explode the label
column then cross it with the sentences
column by using pandas.crosstab
.
Try this:
def cross_labels(df):
return pd.crosstab(df["Sentence"], df["label"])
out = (
df.assign(label= df["label"].str.split(","))
.explode("label")
.pipe(cross_labels)
.rename_axis(None, axis=1)
.reset_index()
)
print(out)
Sentence dog fox greetings some_class someother_class
0 Hello World 0 0 1 0 0
1 Just some text to make third sentence! 0 0 0 1 1
2 The quick brown fox jumps over the lazy dog. 1 1 0 0 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.