[英]Using multiple dictionaries to populate a pandas dataframe
我目前有一個數據框,希望使用字典對輸入值。
# create count dataframe
range_of_years = range(2012, 2017)
topics = ['ecology','evolution','mathematics','biogeography','neutral theory']
topic_count_timeline = pandas.DataFrame(index = topics, columns = range_of_years)
# dictionary pair
count_dict = {2012: 10, 2013: 20, 2014: 12, 2015: 8, 2016: 9}
paper_topics_dict = {'ecology': 0.7, 'neutral theory': 0.3}
我想遍歷字典鍵,選擇具有與鍵相對應的列和索引的數據框單元格,然后將字典值的乘積添加到該單元格中。 這樣我就可以得到結果數據框:
2012 2013 2014 2015 2016
ecology 7 14 8.4 5.6 7.7
evolution NaN NaN NaN NaN NaN
mathematics NaN NaN NaN NaN NaN
biogeography NaN NaN NaN NaN NaN
neutral theory 3 6 3.6 2.4 3.3
我打算使用許多對字典(例如count_dict
和paper_topic_dict
來更新topic_count_timeline
數據幀, paper_topic_dict
將新輸入與單元格的先前paper_topic_dict
者相加而不是覆蓋。
例如,如果使用另一對更新數據幀...
# Additional dictionaries
count_dict2 = {2012: 3, 2013: 2, 2014: 15, 2015: 16, 2016: 13}
paper_topics_dict2 = {'mathematics': 0.6, 'neutral theory': 0.4}
數據框將如下所示:
2012 2013 2014 2015 2016
ecology 7 14 8.4 5.6 7.7
evolution NaN NaN NaN NaN NaN
mathematics 1.8 1.2 9 9.6 5.4
biogeography NaN NaN NaN NaN NaN
neutral theory 4.2 6.8 9.6 8.8 8.5
謝謝。
我相信需要:
for k, v in paper_topics_dict.items():
topic_count_timeline.loc[k] = v
for k, v in count_dict.items():
topic_count_timeline[k] *= v
print (topic_count_timeline)
2012 2013 2014 2015 2016
ecology 7 14 8.4 5.6 6.3
evolution NaN NaN NaN NaN NaN
mathematics NaN NaN NaN NaN NaN
biogeography NaN NaN NaN NaN NaN
neutral theory 3 6 3.6 2.4 2.7
但是,如果有字典對工作更好地為每一對更新defaultdict
,然后將其轉換成Series
,為DataFrame
加unstack
和reindex
失蹤列和索引值:
from collections import defaultdict
count_dict = {2012: 10, 2013: 20, 2014: 12, 2015: 8, 2016: 9}
paper_topics_dict = {'ecology': 0.7, 'neutral theory': 0.3}
count_dict2 = {2012: 3, 2013: 2, 2014: 15, 2015: 16, 2016: 13}
paper_topics_dict2 = {'mathematics': 0.6, 'neutral theory': 0.4}
L = [(count_dict, paper_topics_dict), (count_dict2, paper_topics_dict2)]
d = defaultdict(float)
for a, b in L:
for k, v in b.items():
for k2, v2 in a.items():
d[(k, k2)] += v*v2
df = pd.Series(d).unstack().reindex(index=topics, columns=range_of_years)
print (df)
2012 2013 2014 2015 2016
ecology 7.0 14.0 8.4 5.6 6.3
evolution NaN NaN NaN NaN NaN
mathematics 1.8 1.2 9.0 9.6 7.8
biogeography NaN NaN NaN NaN NaN
neutral theory 4.2 6.8 9.6 8.8 7.9
您可以使用combine_first
並為您的dict
創建新的df
topic_count_timeline.combine_first(pd.DataFrame(data=np.array(list(count_dict.values()))*np.array(list(paper_topics_dict.values()))[:,None],columns=count_dict.keys(),index=paper_topics_dict.keys()))
Out[683]:
2012 2013 2014 2015 2016
biogeography NaN NaN NaN NaN NaN
ecology 7.0 14.0 8.4 5.6 6.3
evolution NaN NaN NaN NaN NaN
mathematics NaN NaN NaN NaN NaN
neutral theory 3.0 6.0 3.6 2.4 2.7
更多信息
pd.DataFrame(data=np.array(list(count_dict.values()))*np.array(list(paper_topics_dict.values()))[:,None],columns=count_dict.keys(),index=paper_topics_dict.keys())
Out[684]:
2012 2013 2014 2015 2016
ecology 7.0 14.0 8.4 5.6 6.3
neutral theory 3.0 6.0 3.6 2.4 2.7
我將為此和pd.DataFrame.pipe
使用一個函數。
然后,您可以將pipe
語法用於后續詞典。
def update_data(df, counts, topics):
for k, v in topics.items():
for k2, v2 in counts.items():
df.loc[k, k2] = v*v2
return df
count_dict = {2012: 10, 2013: 20, 2014: 12, 2015: 8, 2016: 9}
paper_topics_dict = {'ecology': 0.7, 'neutral theory': 0.3}
df = df.pipe(update_data, count_dict, paper_topics_dict)
print(df)
# 2012 2013 2014 2015 2016
# ecology 7 14 8.4 5.6 6.3
# evolution NaN NaN NaN NaN NaN
# mathematics NaN NaN NaN NaN NaN
# biogeography NaN NaN NaN NaN NaN
# neutral theory 3 6 3.6 2.4 2.7
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.