简体   繁体   中英

How to convert a list of lists into a dataframe in python

I have a list of list which looks like:

[['A'],
 ['America'],
 ['2017-39', '2017-40', '2017-41', '2017-42', '2017-43'],
 [10.0, 6.0, 6.0, 6.0, 1.0],
 [5.0,7.0,8.0,9.0,1.0],
 ,
 ['B'],
 ['Britan'],
 ['2017-38', '2017-39', '2017-40', '2017-41', '2017-42', '2017-43', '2017-44'],
 [41.0, 27.0, 38.0, 36.0, 33.0, 41.0, 8.0],
 [40.0, 38.0, 28.0, 27.0, 23.0, 65.0, 4.0]]

I want to convert this into a dataframe which should look like

A America     2017-39   10.0  5.0
na   na       2017-40    6.0  7.0
na   na       2017-41    6.0  8.0
na   na       2017-42    6.0  9.0
na   na       2017-43    1.0 10.0
B Britan      2017-38   41.0 40.0
na   na       2017-39   27.0 38.0
na   na       2017-40   38.0 28.0
na   na       2017-41   36.0 27.0
na   na       2017-42   33.0 23.0
na   na       2017-43   41.0 65.0
na   na       2017-44    8.0  4.0

How can I code to make it possible , as I am pretty new to python, I am having a hard time.

I will really appreciate your time and effort to help me in this regards

import pandas as pd
data = [['A'],
 ['America'],
 ['2017-39', '2017-40', '2017-41', '2017-42', '2017-43'],
 [10.0, 6.0, 6.0, 6.0, 1.0],
 [5.0,7.0,8.0,9.0,1.0],
 ['B'],
 ['Britan'],
 ['2017-38', '2017-39', '2017-40', '2017-41', '2017-42', '2017-43', '2017-44'],
 [41.0, 27.0, 38.0, 36.0, 33.0, 41.0, 8.0],
 [40.0, 38.0, 28.0, 27.0, 23.0, 65.0, 4.0]]

result = {}
for letters, countries, dates, val1, val2 in zip(*[iter(data)]*5):
    result[tuple(letters+countries)] = pd.DataFrame({'date':dates, 'val1':val1, 'val2':val2})
result = pd.concat(result)
print(result)

yields

                date  val1  val2
A America 0  2017-39  10.0   5.0
          1  2017-40   6.0   7.0
          2  2017-41   6.0   8.0
          3  2017-42   6.0   9.0
          4  2017-43   1.0   1.0
B Britan  0  2017-38  41.0  40.0
          1  2017-39  27.0  38.0
          2  2017-40  38.0  28.0
          3  2017-41  36.0  27.0
          4  2017-42  33.0  23.0
          5  2017-43  41.0  65.0
          6  2017-44   8.0   4.0

The main idea above is to use the "grouper idiom" zip(*[iter(data)]*5) to group the items in data in groups of 5. That way, you can use

for letters, countries, dates, val1, val2 in zip(*[iter(data)]*5):

to loop through 5 items of data at a time.


pd.concat can accept a dict of DataFrames as input and concatenate them into a single DataFrame with a MultiIndex composed of the keys of the dict . So the for-loop is used to compose the dict of DataFrames,

for letters, countries, dates, val1, val2 in zip(*[iter(data)]*5):
    result[tuple(letters+countries)] = pd.DataFrame({'date':dates, 'val1':val1, 'val2':val2})

and then

result = pd.concat(result)

produces the desired DataFrame.


Not that you could drop the last level of the MultiIndex:

In [91]: result.index = result.index.droplevel(level=-1)

In [92]: result
Out[92]: 
              date  val1  val2
A America  2017-39  10.0   5.0
  America  2017-40   6.0   7.0
  America  2017-41   6.0   8.0
  America  2017-42   6.0   9.0
  America  2017-43   1.0   1.0
B Britan   2017-38  41.0  40.0
  Britan   2017-39  27.0  38.0
  Britan   2017-40  38.0  28.0
  Britan   2017-41  36.0  27.0
  Britan   2017-42  33.0  23.0
  Britan   2017-43  41.0  65.0
  Britan   2017-44   8.0   4.0

but I wouldn't recommend this since it makes the index non-unique:

In [96]: result.index.is_unique
Out[96]: False

and this can cause future difficulties since some Pandas operations only work on DataFrames with unique indexes.

One solution is to use itertools to perform some chaining magic.

There are 2 essential idioms we use:

  1. For identifer columns, zip the lengths of data lists together with identifers.
  2. For data columns, use chain.from_iterable (assigned to chainer ) to combine every 5th sublist.

In both cases, we utilise islice to avoid creating lists unnecessarily as intermediate steps.

data is defined as per @unutbu's post.

Solution

import pandas as pd
from itertools import chain, islice

chainer = chain.from_iterable

lens = list(map(len, islice(data, 2, None, 5)))

res = pd.DataFrame({'id1': list(chainer(list(j)+[np.nan]*(i-1) for i, j in
                                zip(lens, islice(data, 0, None, 5)))),
                    'id2': list(chainer(list(j)+[np.nan]*(i-1) for i, j in 
                                zip(lens, islice(data, 1, None, 5)))),
                    'date': list(chainer(islice(data, 2, None, 5))),
                    'num1': list(chainer(islice(data, 3, None, 5))),
                    'num2': list(chainer(islice(data, 4, None, 5)))})

res = res[['id1', 'id2', 'date', 'num1', 'num2']]

Result

print(res)

    id1      id2     date  num1  num2
0     A  America  2017-39  10.0   5.0
1   NaN      NaN  2017-40   6.0   7.0
2   NaN      NaN  2017-41   6.0   8.0
3   NaN      NaN  2017-42   6.0   9.0
4   NaN      NaN  2017-43   1.0   1.0
5     B   Britan  2017-38  41.0  40.0
6   NaN      NaN  2017-39  27.0  38.0
7   NaN      NaN  2017-40  38.0  28.0
8   NaN      NaN  2017-41  36.0  27.0
9   NaN      NaN  2017-42  33.0  23.0
10  NaN      NaN  2017-43  41.0  65.0
11  NaN      NaN  2017-44   8.0   4.0

I am using groupby and re-create the column

s=pd.DataFrame(lst).T
s.columns=s.columns//5
pd.concat([pd.DataFrame(x.values) for _,x in s.groupby(level=0,axis=1)]).dropna(axis=0,thresh=1)
Out[146]: 
      0        1        2   3   4
0     A  America  2017-39  10   5
1  None     None  2017-40   6   7
2  None     None  2017-41   6   8
3  None     None  2017-42   6   9
4  None     None  2017-43   1   1
0     B   Britan  2017-38  41  40
1  None     None  2017-39  27  38
2  None     None  2017-40  38  28
3  None     None  2017-41  36  27
4  None     None  2017-42  33  23
5  None     None  2017-43  41  65
6  None     None  2017-44   8   4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM