I have a list of list which looks like:
[['A'],
['America'],
['2017-39', '2017-40', '2017-41', '2017-42', '2017-43'],
[10.0, 6.0, 6.0, 6.0, 1.0],
[5.0,7.0,8.0,9.0,1.0],
,
['B'],
['Britan'],
['2017-38', '2017-39', '2017-40', '2017-41', '2017-42', '2017-43', '2017-44'],
[41.0, 27.0, 38.0, 36.0, 33.0, 41.0, 8.0],
[40.0, 38.0, 28.0, 27.0, 23.0, 65.0, 4.0]]
I want to convert this into a dataframe which should look like
A America 2017-39 10.0 5.0
na na 2017-40 6.0 7.0
na na 2017-41 6.0 8.0
na na 2017-42 6.0 9.0
na na 2017-43 1.0 10.0
B Britan 2017-38 41.0 40.0
na na 2017-39 27.0 38.0
na na 2017-40 38.0 28.0
na na 2017-41 36.0 27.0
na na 2017-42 33.0 23.0
na na 2017-43 41.0 65.0
na na 2017-44 8.0 4.0
How can I code to make it possible , as I am pretty new to python, I am having a hard time.
I will really appreciate your time and effort to help me in this regards
import pandas as pd
data = [['A'],
['America'],
['2017-39', '2017-40', '2017-41', '2017-42', '2017-43'],
[10.0, 6.0, 6.0, 6.0, 1.0],
[5.0,7.0,8.0,9.0,1.0],
['B'],
['Britan'],
['2017-38', '2017-39', '2017-40', '2017-41', '2017-42', '2017-43', '2017-44'],
[41.0, 27.0, 38.0, 36.0, 33.0, 41.0, 8.0],
[40.0, 38.0, 28.0, 27.0, 23.0, 65.0, 4.0]]
result = {}
for letters, countries, dates, val1, val2 in zip(*[iter(data)]*5):
result[tuple(letters+countries)] = pd.DataFrame({'date':dates, 'val1':val1, 'val2':val2})
result = pd.concat(result)
print(result)
yields
date val1 val2
A America 0 2017-39 10.0 5.0
1 2017-40 6.0 7.0
2 2017-41 6.0 8.0
3 2017-42 6.0 9.0
4 2017-43 1.0 1.0
B Britan 0 2017-38 41.0 40.0
1 2017-39 27.0 38.0
2 2017-40 38.0 28.0
3 2017-41 36.0 27.0
4 2017-42 33.0 23.0
5 2017-43 41.0 65.0
6 2017-44 8.0 4.0
The main idea above is to use the "grouper idiom" zip(*[iter(data)]*5)
to group the items in data
in groups of 5. That way, you can use
for letters, countries, dates, val1, val2 in zip(*[iter(data)]*5):
to loop through 5 items of data
at a time.
pd.concat
can accept a dict
of DataFrames as input and concatenate them into a single DataFrame with a MultiIndex composed of the keys of the dict
. So the for-loop
is used to compose the dict
of DataFrames,
for letters, countries, dates, val1, val2 in zip(*[iter(data)]*5):
result[tuple(letters+countries)] = pd.DataFrame({'date':dates, 'val1':val1, 'val2':val2})
and then
result = pd.concat(result)
produces the desired DataFrame.
Not that you could drop the last level of the MultiIndex:
In [91]: result.index = result.index.droplevel(level=-1)
In [92]: result
Out[92]:
date val1 val2
A America 2017-39 10.0 5.0
America 2017-40 6.0 7.0
America 2017-41 6.0 8.0
America 2017-42 6.0 9.0
America 2017-43 1.0 1.0
B Britan 2017-38 41.0 40.0
Britan 2017-39 27.0 38.0
Britan 2017-40 38.0 28.0
Britan 2017-41 36.0 27.0
Britan 2017-42 33.0 23.0
Britan 2017-43 41.0 65.0
Britan 2017-44 8.0 4.0
but I wouldn't recommend this since it makes the index non-unique:
In [96]: result.index.is_unique
Out[96]: False
and this can cause future difficulties since some Pandas operations only work on DataFrames with unique indexes.
One solution is to use itertools
to perform some chaining magic.
There are 2 essential idioms we use:
zip
the lengths of data lists together with identifers. chain.from_iterable
(assigned to chainer
) to combine every 5th sublist. In both cases, we utilise islice
to avoid creating lists unnecessarily as intermediate steps.
data
is defined as per @unutbu's post.
Solution
import pandas as pd
from itertools import chain, islice
chainer = chain.from_iterable
lens = list(map(len, islice(data, 2, None, 5)))
res = pd.DataFrame({'id1': list(chainer(list(j)+[np.nan]*(i-1) for i, j in
zip(lens, islice(data, 0, None, 5)))),
'id2': list(chainer(list(j)+[np.nan]*(i-1) for i, j in
zip(lens, islice(data, 1, None, 5)))),
'date': list(chainer(islice(data, 2, None, 5))),
'num1': list(chainer(islice(data, 3, None, 5))),
'num2': list(chainer(islice(data, 4, None, 5)))})
res = res[['id1', 'id2', 'date', 'num1', 'num2']]
Result
print(res)
id1 id2 date num1 num2
0 A America 2017-39 10.0 5.0
1 NaN NaN 2017-40 6.0 7.0
2 NaN NaN 2017-41 6.0 8.0
3 NaN NaN 2017-42 6.0 9.0
4 NaN NaN 2017-43 1.0 1.0
5 B Britan 2017-38 41.0 40.0
6 NaN NaN 2017-39 27.0 38.0
7 NaN NaN 2017-40 38.0 28.0
8 NaN NaN 2017-41 36.0 27.0
9 NaN NaN 2017-42 33.0 23.0
10 NaN NaN 2017-43 41.0 65.0
11 NaN NaN 2017-44 8.0 4.0
I am using groupby
and re-create the column
s=pd.DataFrame(lst).T
s.columns=s.columns//5
pd.concat([pd.DataFrame(x.values) for _,x in s.groupby(level=0,axis=1)]).dropna(axis=0,thresh=1)
Out[146]:
0 1 2 3 4
0 A America 2017-39 10 5
1 None None 2017-40 6 7
2 None None 2017-41 6 8
3 None None 2017-42 6 9
4 None None 2017-43 1 1
0 B Britan 2017-38 41 40
1 None None 2017-39 27 38
2 None None 2017-40 38 28
3 None None 2017-41 36 27
4 None None 2017-42 33 23
5 None None 2017-43 41 65
6 None None 2017-44 8 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.