I am having a dataframe with following values:
sentence_id words labels
3822445 ['a', 'b', 'c', ''] ['B-PER', 'I-PER', 'I-PER', 'I-PER']
3822446 ['d', 'e', ''] ['B-PER', 'I-PER', 'I-PER']
3822447 ['f', 'g', 'h'] ['B-PER', 'I-PER', 'I-PER']
Exepcting output as:
sentence_id words labels
3822445 'a' 'B-PER'
3822445 'b' 'I-PER'
3822445 'c' 'I-PER'
3822445 '' 'I-PER'
3822446 'd' 'B-PER'
3822446 'e' 'I-PER'
3822446 '' 'I-PER'
3822447 'f' 'B-PER'
3822447 'g' 'I-PER'
3822447 'h' 'I-PER'
I have tried:
dataframe.set_index(['sentence_id']).apply(pd.Series.explode).reset_index()
but giving same output as input. Don't know what's going wrong.
If you want a simple one-liner you can use explode
with pandas>=0.25.0
df.explode('words').assign(labels=df['labels'].explode())
df.explode(['words','labels'], ignore_index=True)
Output:
sentence_id words labels
0 3822445 a B-PER
1 3822445 b I-PER
2 3822445 c I-PER
3 3822445 I-PER
4 3822446 d B-PER
5 3822446 e I-PER
6 3822446 I-PER
7 3822447 f B-PER
8 3822447 g I-PER
9 3822447 h I-PER
This works fine with me. What are your unexpected results?
df = pd.DataFrame({'sentence_id':[3822445, 3822446, 3822447],
'words':[['a', 'b', 'c', ''],
['d', 'e', ''],
['f', 'g','h']],
'labels':[['B-PER', 'I-PER', 'I-PER', 'I-PER'],
['B-PER','I-PER', 'I-PER'],
['B-PER', 'I-PER','I-PER']]})
df.set_index('sentence_id').apply(pd.Series.explode).reset_index()
Output:
sentence_id words labels
0 3822445 a B-PER
1 3822445 b I-PER
2 3822445 c I-PER
3 3822445 I-PER
4 3822446 d B-PER
5 3822446 e I-PER
6 3822446 I-PER
7 3822447 f B-PER
8 3822447 g I-PER
9 3822447 h I-PER
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.