简体   繁体   中英

Converting dictionary with values in List to Pandas DataFrame

I have a dictionary with City names as keys and corresponding to each city there is a list of dates. For Example:

{
'A':['2017-01-02','2017-01-03'],
'B':['2017-02-02','2017-02-03','2017-02-04','2017-02-05'],
'C':['2016-02-02']
}

And I want to convert this to the following dataframe with 2 columns.

City_Name  Date
A          2017-01-02
A          2017-01-03
B          2017-02-02
B          2017-02-03
B          2017-02-04
B          2017-02-05
C          2016-02-02

Or we can using melt

pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.items() ])).melt().dropna()
Out[51]: 
  variable       value
0        A  2017-01-02
1        A  2017-01-03
4        B  2017-02-02
5        B  2017-02-03
6        B  2017-02-04
7        B  2017-02-05
8        C  2016-02-02

A way inspired by piR

pd.Series(d).apply(pd.Series).melt().dropna()
Out[142]: 
    variable       value
0          0  2017-01-02
1          0  2017-02-02
2          0  2016-02-02
3          1  2017-01-03
4          1  2017-02-03
7          2  2017-02-04
10         3  2017-02-05

Use numpy.repeat for repeat keys :

#get lens of lists
a = [len(x) for x in d.values()]
#flattening values
b = [i for s in d.values() for i in s]
df = pd.DataFrame({'City_Name':np.repeat(list(d.keys()), a), 'Date':b})
print (df)

  City_Name        Date
0         C  2016-02-02
1         B  2017-02-02
2         B  2017-02-03
3         B  2017-02-04
4         B  2017-02-05
5         A  2017-01-02
6         A  2017-01-03

Another similar like Danh Pham' solution , credit to him:

df = pd.DataFrame([(i, day) for i,j in d.items() for day in j], 
                  columns=['City_Name','Date'])
print(df)

  City_Name        Date
0         C  2016-02-02
1         B  2017-02-02
2         B  2017-02-03
3         B  2017-02-04
4         B  2017-02-05
5         A  2017-01-02
6         A  2017-01-03

You can use DataFrame.from_dict (only if lists are all of the same lenght)

import pandas as pd

import pandas as pd

d = {
'A':['2017-01-02','2017-01-03'],
'B':['2017-02-02','2017-02-03','2017-02-04','2017-02-05'],
'C':['2016-02-02']
}

df = pd.DataFrame.from_dict(d, orient='index').stack().reset_index()

df.columns = ["City_Name", "A", "Date"]

del df["A"]

print(df)

res:

  City_Name        Date
0         B  2017-02-02
1         B  2017-02-03
2         B  2017-02-04
3         B  2017-02-05
4         A  2017-01-02
5         A  2017-01-03
6         C  2016-02-02

You can reprocess your data into list of tuple of name and date, ex: ('A', '2017-01-01') before make the DataFrame .

Try this:

import pandas as pd

data = {
'A':['2017-01-02','2017-01-03'],
'B':['2017-02-02','2017-02-03','2017-02-04','2017-02-05'],
'C':['2016-02-02']
}

pd.DataFrame([(i[0], day) for i in data.items() for day in i[1]])

Output:

   0           1
0  A  2017-01-02
1  A  2017-01-03
2  C  2016-02-02
3  B  2017-02-02
4  B  2017-02-03
5  B  2017-02-04
6  B  2017-02-05

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM