简体   繁体   中英

Dataframe from list of lists with different length

How can I turn a list like the following into a Dataframe with 5 columns??

[[['30/09/2015', 'C', 'ETERNITON NM H', '1,73', '400']],
 [['05/08/2019', 'C', 'CIELOON NM', '7,75', '500'],
  ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
  ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
  ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100']],
 [['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '9'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'WEGON EJ NM H', '30,88', '99']],
 [['16/12/2019', 'C', 'IRBBRASIL REON NM', '36,72', '100'],
  ['16/12/2019', 'C', 'ITAUUNIBANCOON EJ N1', '31,45', '200']]]

Blockquote

Normalize the raw data and create a df

import pandas as pd

data = [[['30/09/2015', 'C', 'ETERNITON NM H', '1,73', '400']],
        [['05/08/2019', 'C', 'CIELOON NM', '7,75', '500'],
         ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
         ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
         ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100']],
        [['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '9'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'WEGON EJ NM H', '30,88', '99']],
        [['16/12/2019', 'C', 'IRBBRASIL REON NM', '36,72', '100'],
         ['16/12/2019', 'C', 'ITAUUNIBANCOON EJ N1', '31,45', '200']]]
lst = []
for entry in data:
    for sub in entry:
        lst.append(sub)
df = pd.DataFrame(data=lst, columns=['A', 'B', 'C', 'D', 'E'])
print(df)

output

             A  B                     C      D    E
0   30/09/2015  C        ETERNITON NM H   1,73  400
1   05/08/2019  C            CIELOON NM   7,75  500
2   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
3   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
4   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
5   25/03/2015  C          CETIPON NM H  31,17   10
6   25/03/2015  C          CETIPON NM H  31,17    9
7   25/03/2015  C          CETIPON NM H  31,17   10
8   25/03/2015  C          CETIPON NM H  31,17   10
9   25/03/2015  C          CETIPON NM H  31,17   10
10  25/03/2015  C          CETIPON NM H  31,17   10
11  25/03/2015  C          CETIPON NM H  31,17   10
12  25/03/2015  C          CETIPON NM H  31,17   10
13  25/03/2015  C          CETIPON NM H  31,17   10
14  25/03/2015  C          CETIPON NM H  31,17   10
15  25/03/2015  C         WEGON EJ NM H  30,88   99
16  16/12/2019  C     IRBBRASIL REON NM  36,72  100
17  16/12/2019  C  ITAUUNIBANCOON EJ N1  31,45  200

Simply flatten the list to get the rows and then convert to dataframe -

import pandas as pd

flat = [row for item in l for row in item]
df = pd.DataFrame(flat, columns=['A','B','C','D','E'])
print(df)
             A  B                     C      D    E
0   30/09/2015  C        ETERNITON NM H   1,73  400
1   05/08/2019  C            CIELOON NM   7,75  500
2   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
3   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
4   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
5   25/03/2015  C          CETIPON NM H  31,17   10
6   25/03/2015  C          CETIPON NM H  31,17    9
7   25/03/2015  C          CETIPON NM H  31,17   10
8   25/03/2015  C          CETIPON NM H  31,17   10
9   25/03/2015  C          CETIPON NM H  31,17   10
10  25/03/2015  C          CETIPON NM H  31,17   10
11  25/03/2015  C          CETIPON NM H  31,17   10
12  25/03/2015  C          CETIPON NM H  31,17   10
13  25/03/2015  C          CETIPON NM H  31,17   10
14  25/03/2015  C          CETIPON NM H  31,17   10
15  25/03/2015  C         WEGON EJ NM H  30,88   99
16  16/12/2019  C     IRBBRASIL REON NM  36,72  100
17  16/12/2019  C  ITAUUNIBANCOON EJ N1  31,45  200

Flatten the records by using pandas explode and then create a dataframe

import pandas as pd
lst = [[['30/09/2015', 'C', 'ETERNITON NM H', '1,73', '400']],
 [['05/08/2019', 'C', 'CIELOON NM', '7,75', '500'],
  ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
  ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
  ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100']],
 [['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '9'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'WEGON EJ NM H', '30,88', '99']],
 [['16/12/2019', 'C', 'IRBBRASIL REON NM', '36,72', '100'],
  ['16/12/2019', 'C', 'ITAUUNIBANCOON EJ N1', '31,45', '200']]]
df = pd.DataFrame(list(pd.Series(lst).explode()))
print(df)

Here is another solution, using chain.from_iterable

import pandas as pd
from itertools import chain

pd.DataFrame(chain.from_iterable(data), columns=list("ABCDE"))

             A  B                     C      D    E
0   30/09/2015  C        ETERNITON NM H   1,73  400
1   05/08/2019  C            CIELOON NM   7,75  500
2   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
3   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
    ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM