简体   繁体   中英

Transform list of tuples in pandas.DataFrame

I have three lists of tuples and the first element of those lists is a year, like shown below.

list1 = [
    ('2010', 1783675.0), ('2011', 1815815.0), ('2012', 1633258.0), ('2013', 1694062.0), ('2014', 1906527.0), 
    ('2015', 1908661.0), ('2016', 2492979.0), ('2017', 2846997.0), ('2018', 2930313.0), ('2019', 2654724.0)
]

list2 = [
    ('2010', 302816.0), ('2011', 229549.0), ('2012', 323063.0), ('2013', 285066.0), ('2014', 282003.0), 
    ('2015', 354500.0), ('2016', 275383.0), ('2017', 322074.0), ('2018', 366909.0), ('2019', 297942.0)
]

list3 =[
    ('2010', 149036.0), ('2011', 144112.0), ('2012', 173944.0), ('2013', 205724.0), ('2014', 214019.0), 
    ('2015', 261462.0), ('2016', 260646.0), ('2017', 279267.0), ('2018', 288120.0), ('2019', 277106.0)
]

I want to create a pandas.DataFrame using those lists, setting the year as the row index:

          list1     list2     list3
2010  1783675.0  302816.0  149036.0
2011  1815815.0  229549.0  144112.0
2012  1633258.0  323063.0  173944.0
2013  1694062.0  285066.0  205724.0
2014  1906527.0  282003.0  214019.0
2015  1908661.0  354500.0  261462.0
2016  2492979.0  275383.0  260646.0
2017  2846997.0  322074.0  279267.0
2018  2930313.0  366909.0  288120.0
2019  2654724.0  297942.0  277106.0

You can create a new DataFrame for each list and merge them using the merge method.

import pandas as pd 

list1 = [('2010', 1783675.0), ('2011', 1815815.0), ('2012', 1633258.0), ('2013', 1694062.0),
('2014', 1906527.0),  ('2015', 1908661.0), ('2016', 2492979.0), ('2017', 2846997.0), ('2018', 2930313.0),
 ('2019', 2654724.0)]

list2 = [('2010', 302816.0), ('2011', 229549.0), ('2012', 323063.0), ('2013', 285066.0),
 ('2014', 282003.0), ('2015', 354500.0), ('2016', 275383.0), ('2017', 322074.0), ('2018', 366909.0),
 ('2019', 297942.0)]

list3 =[('2010', 149036.0), ('2011', 144112.0), ('2012', 173944.0), ('2013', 205724.0),
 ('2014', 214019.0), ('2015', 261462.0), ('2016', 260646.0), ('2017', 279267.0), ('2018', 288120.0),
 ('2019', 277106.0)]

df = (pd.DataFrame(data=list1, columns=["year", "list1"])
        .merge(pd.DataFrame(data=list2, columns=["year", "list2"]), on="year")
        .merge(pd.DataFrame(data=list3, columns=["year", "list3"]), on="year"))

Another option to the answers already provided: python's defaultdict could simplify the process of lumping the data into one dictionary before reading it into a dataframe:

 from collections import defaultdict
 from itertools import chain

 #chain the lists into one, then
 #get all the similar values into one list:

 d = defaultdict(list)

 for k, v in chain(list1,list2,list3):
     d[k].append(v)

 #read the data into a pandas dataframe:

 df = pd.DataFrame.from_dict(d, orient='index', columns=['list1','list2','list3'])

          list1      list2       list3
2010    1783675.0   302816.0    149036.0
2011    1815815.0   229549.0    144112.0
2012    1633258.0   323063.0    173944.0
2013    1694062.0   285066.0    205724.0
2014    1906527.0   282003.0    214019.0
2015    1908661.0   354500.0    261462.0
2016    2492979.0   275383.0    260646.0
2017    2846997.0   322074.0    279267.0
2018    2930313.0   366909.0    288120.0
2019    2654724.0   297942.0    277106.0

You can iterate over the lists and create a dictionary in the correct format, and then turn that into a DataFrame. Note that this assumes ordered lists, with the same years in each list.

import pandas as pd

list1 = [('2010', 1783675.0), ('2011', 1815815.0), ('2012', 1633258.0),
    ('2013', 1694062.0), ('2014', 1906527.0), ('2015', 1908661.0),
    ('2016', 2492979.0), ('2017', 2846997.0), ('2018', 2930313.0),
    ('2019', 2654724.0)]

list2 = [('2010', 302816.0), ('2011', 229549.0), ('2012', 323063.0),
    ('2013', 285066.0), ('2014', 282003.0), ('2015', 354500.0),
    ('2016', 275383.0), ('2017', 322074.0), ('2018', 366909.0),
    ('2019', 297942.0)]

list3 =[('2010', 149036.0), ('2011', 144112.0), ('2012', 173944.0),
    ('2013', 205724.0), ('2014', 214019.0), ('2015', 261462.0),
    ('2016', 260646.0), ('2017', 279267.0), ('2018', 288120.0),
    ('2019', 277106.0)]

df_dict = {}
years = [el[0] for el in list1]

df_dict["list1"] = [el[1] for el in list1]
df_dict["list2"] = [el[1] for el in list2]
df_dict["list3"] = [el[1] for el in list3]

df = pd.DataFrame(df_dict, index=years)

Another solution is to use pandas.concat on pandas.Series made in a for-loop. The code is following:

series = []

for l, name in [(list1, 'list1'), (list2, 'list2'), (list3, 'list3')]:
    series.append(pd.Series({tup[0]: tup[1] for tup in l}, name=name))

df = pd.concat(series, axis=1)

And the result looks like this:

>>> print(df)
          list1     list2     list3
2010  1783675.0  302816.0  149036.0
2011  1815815.0  229549.0  144112.0
2012  1633258.0  323063.0  173944.0
2013  1694062.0  285066.0  205724.0
2014  1906527.0  282003.0  214019.0
2015  1908661.0  354500.0  261462.0
2016  2492979.0  275383.0  260646.0
2017  2846997.0  322074.0  279267.0
2018  2930313.0  366909.0  288120.0
2019  2654724.0  297942.0  277106.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM