简体   繁体   中英

How to flatten list in a pandas dataframe column?

I have a pandas dataframe where one column RESULT has list inside a list.

  ID                     RESULT
0  A  [nan, ['PASS'], nan, nan]
1  B  [['FAIL'], nan, nan, nan]
2  C  [['PASS'], nan, nan, nan]
3  D       [nan, nan, nan, nan]
4  E  [nan, ['FAIL'], nan, nan]

I want to make the RESULT column a flat list. For example the first case would be [nan, 'PASS', nan, nan]. Final answer should look like below.

  ID                     RESULT
0  A  [nan, 'PASS', nan, nan]
1  B  ['FAIL', nan, nan, nan]
2  C  ['PASS', nan, nan, nan]
3  D  [nan, nan, nan, nan]
4  E  [nan, 'FAIL', nan, nan]

I tried to create a function but it is not updating the column to a flat list. Below is the code I tried.

def flatten_list(mylist):
    # print(mylist)
    for index, value in enumerate(mylist):
        if type(value) is list:
            mylist[index] = value[0]
        # print(mylist)
        return mylist

df_bin['RESULT'] = df_bin['RESULT'].apply(flatten_list)

But if I try a simple example below it works. I wonder what is the difference. I will appreciate any guidance. Also is it possible to use lambda function to achieve the same result.

mylist = [nan, ['PASS'], nan, nan]
for n, i in enumerate(mylist):
    if type(i) is list:
        mylist[n] = i[0]
print(mylist)

It is possible to do this using an internal flatten function from pandas.core

import pandas as pd
from pandas.core.common import flatten

df = pd.DataFrame({'ID':['A','B'],
                   'Result':[['nan', ['PASS'], 'nan', 'nan'], [['FAIL'], 'nan', 'nan', 'nan']]
                  })
df['Result'] = df['Result'].apply(lambda x: list(flatten(x)))

Output:

    ID  Result
0   A   [nan, PASS, nan, nan]
1   B   [FAIL, nan, nan, nan]

Based on your example, I guess this should work.

You're almost there, you have to unindent the return statement

def flatten_list(mylist):
    # print(mylist)
    for index, value in enumerate(mylist):
        if type(value) is list:
            mylist[index] = value[0]
        # print(mylist)
        return mylist  # <- indentation issue here. 

Here is a more general solution if your sublist contains more than one item.

def flatten_list(cell):
  fcell = []
  for item in cell:
    if isinstance(item, list):
      fcell += item
    else:
      fcell += [item]
  return fcell


df_bin['RESULT'] =  df_bin['RESULT'].apply(flatten_list)

demo

A more efficient way of doing this (if you care about performance) is avoiding a loop and using numpy.hstack instead. Here is an example.

from numpy import hstack, nan
lst= [nan, ['PASS'], nan, nan]

lst2 = list(hstack(lst))

print(lst2)

Output:

['nan', 'PASS', 'nan', 'nan']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM