Explode the list values in dataframe columns

Question

I am having a dataframe with following values:

sentence_id  words                    labels
3822445      ['a', 'b', 'c', '']      ['B-PER', 'I-PER', 'I-PER', 'I-PER']
3822446      ['d', 'e', '']           ['B-PER', 'I-PER', 'I-PER']
3822447      ['f', 'g', 'h']          ['B-PER', 'I-PER', 'I-PER']

Exepcting output as:

sentence_id  words    labels    
3822445       'a'     'B-PER'
3822445       'b'     'I-PER'
3822445       'c'     'I-PER'
3822445       ''      'I-PER'
3822446       'd'     'B-PER'
3822446       'e'     'I-PER'
3822446       ''      'I-PER'
3822447       'f'     'B-PER'
3822447       'g'     'I-PER'
3822447       'h'     'I-PER'

I have tried:

dataframe.set_index(['sentence_id']).apply(pd.Series.explode).reset_index()

but giving same output as input. Don't know what's going wrong.

Answer 1

If you want a simple one-liner you can use explode with pandas>=0.25.0

df.explode('words').assign(labels=df['labels'].explode())

Answer 2

Update for pandas 1.3.0

pandas.DataFrame.explode now accepts a list of column headers

df.explode(['words','labels'], ignore_index=True)

Output:

   sentence_id words labels
0      3822445     a  B-PER
1      3822445     b  I-PER
2      3822445     c  I-PER
3      3822445        I-PER
4      3822446     d  B-PER
5      3822446     e  I-PER
6      3822446        I-PER
7      3822447     f  B-PER
8      3822447     g  I-PER
9      3822447     h  I-PER

This works fine with me. What are your unexpected results?

df  = pd.DataFrame({'sentence_id':[3822445, 3822446, 3822447],
                    'words':[['a', 'b', 'c', ''],
                            ['d', 'e', ''],
                            ['f', 'g','h']],
                   'labels':[['B-PER', 'I-PER', 'I-PER', 'I-PER'],
                            ['B-PER','I-PER', 'I-PER'],
                            ['B-PER', 'I-PER','I-PER']]})

df.set_index('sentence_id').apply(pd.Series.explode).reset_index()

Output:

   sentence_id words labels
0      3822445     a  B-PER
1      3822445     b  I-PER
2      3822445     c  I-PER
3      3822445        I-PER
4      3822446     d  B-PER
5      3822446     e  I-PER
6      3822446        I-PER
7      3822447     f  B-PER
8      3822447     g  I-PER
9      3822447     h  I-PER

Explode the list values in dataframe columns

Question

2 answers

solution1
3 ACCPTED 2021-02-22 17:59:53

solution2
1 2021-02-22 19:06:30

Update for pandas 1.3.0

pandas.DataFrame.explode now accepts a list of column headers

Explode the list values in dataframe columns

Question

2 answers

solution1 3 ACCPTED 2021-02-22 17:59:53

solution2 1 2021-02-22 19:06:30

Update for pandas 1.3.0

pandas.DataFrame.explode now accepts a list of column headers

solution1
3 ACCPTED 2021-02-22 17:59:53

solution2
1 2021-02-22 19:06:30