簡體   English   中英

使用多個字典填充熊貓數據框

[英]Using multiple dictionaries to populate a pandas dataframe

我目前有一個數據框,希望使用字典對輸入值。

# create count dataframe
range_of_years = range(2012, 2017)
topics = ['ecology','evolution','mathematics','biogeography','neutral theory']
topic_count_timeline = pandas.DataFrame(index = topics, columns = range_of_years)


# dictionary pair
count_dict = {2012: 10, 2013: 20, 2014: 12, 2015: 8, 2016: 9}
paper_topics_dict = {'ecology': 0.7, 'neutral theory': 0.3}

我想遍歷字典鍵,選擇具有與鍵相對應的列和索引的數據框單元格,然后將字典值的乘積添加到該單元格中。 這樣我就可以得到結果數據框:

               2012 2013 2014 2015 2016
ecology           7   14  8.4  5.6  7.7
evolution       NaN  NaN  NaN  NaN  NaN
mathematics     NaN  NaN  NaN  NaN  NaN
biogeography    NaN  NaN  NaN  NaN  NaN
neutral theory    3    6  3.6  2.4  3.3

我打算使用許多對字典(例如count_dictpaper_topic_dict來更新topic_count_timeline數據幀, paper_topic_dict將新輸入與單元格的先前paper_topic_dict者相加而不是覆蓋。

例如,如果使用另一對更新數據幀...

# Additional dictionaries
count_dict2 = {2012: 3, 2013: 2, 2014: 15, 2015: 16, 2016: 13}
paper_topics_dict2 = {'mathematics': 0.6, 'neutral theory': 0.4}

數據框將如下所示:

               2012 2013 2014 2015 2016
ecology           7   14  8.4  5.6  7.7
evolution       NaN  NaN  NaN  NaN  NaN
mathematics     1.8  1.2    9  9.6  5.4
biogeography    NaN  NaN  NaN  NaN  NaN
neutral theory  4.2  6.8  9.6  8.8  8.5

謝謝。

我相信需要:

for k, v in paper_topics_dict.items():
    topic_count_timeline.loc[k] = v

for k, v in count_dict.items():
    topic_count_timeline[k] *= v

print (topic_count_timeline)
               2012 2013 2014 2015 2016
ecology           7   14  8.4  5.6  6.3
evolution       NaN  NaN  NaN  NaN  NaN
mathematics     NaN  NaN  NaN  NaN  NaN
biogeography    NaN  NaN  NaN  NaN  NaN
neutral theory    3    6  3.6  2.4  2.7

但是,如果有字典對工作更好地為每一對更新defaultdict ,然后將其轉換成Series ,為DataFrameunstackreindex失蹤列和索引值:

from collections import defaultdict

count_dict = {2012: 10, 2013: 20, 2014: 12, 2015: 8, 2016: 9}
paper_topics_dict = {'ecology': 0.7, 'neutral theory': 0.3}

count_dict2 = {2012: 3, 2013: 2, 2014: 15, 2015: 16, 2016: 13}
paper_topics_dict2 = {'mathematics': 0.6, 'neutral theory': 0.4}

L = [(count_dict, paper_topics_dict), (count_dict2, paper_topics_dict2)]

d = defaultdict(float)
for a, b in L:
    for k, v in b.items():
        for k2, v2 in a.items():
            d[(k, k2)] += v*v2

df = pd.Series(d).unstack().reindex(index=topics, columns=range_of_years)
print (df)
                2012  2013  2014  2015  2016
ecology          7.0  14.0   8.4   5.6   6.3
evolution        NaN   NaN   NaN   NaN   NaN
mathematics      1.8   1.2   9.0   9.6   7.8
biogeography     NaN   NaN   NaN   NaN   NaN
neutral theory   4.2   6.8   9.6   8.8   7.9

您可以使用combine_first並為您的dict創建新的df

topic_count_timeline.combine_first(pd.DataFrame(data=np.array(list(count_dict.values()))*np.array(list(paper_topics_dict.values()))[:,None],columns=count_dict.keys(),index=paper_topics_dict.keys()))
Out[683]: 
                2012  2013  2014  2015  2016
biogeography     NaN   NaN   NaN   NaN   NaN
ecology          7.0  14.0   8.4   5.6   6.3
evolution        NaN   NaN   NaN   NaN   NaN
mathematics      NaN   NaN   NaN   NaN   NaN
neutral theory   3.0   6.0   3.6   2.4   2.7

更多信息

pd.DataFrame(data=np.array(list(count_dict.values()))*np.array(list(paper_topics_dict.values()))[:,None],columns=count_dict.keys(),index=paper_topics_dict.keys())
Out[684]: 
                2012  2013  2014  2015  2016
ecology          7.0  14.0   8.4   5.6   6.3
neutral theory   3.0   6.0   3.6   2.4   2.7

我將為此和pd.DataFrame.pipe使用一個函數。

然后,您可以將pipe語法用於后續詞典。

def update_data(df, counts, topics):
    for k, v in topics.items():
        for k2, v2 in counts.items():
            df.loc[k, k2] = v*v2
    return df

count_dict = {2012: 10, 2013: 20, 2014: 12, 2015: 8, 2016: 9}
paper_topics_dict = {'ecology': 0.7, 'neutral theory': 0.3}

df = df.pipe(update_data, count_dict, paper_topics_dict)

print(df)

#                2012 2013 2014 2015 2016
# ecology           7   14  8.4  5.6  6.3
# evolution       NaN  NaN  NaN  NaN  NaN
# mathematics     NaN  NaN  NaN  NaN  NaN
# biogeography    NaN  NaN  NaN  NaN  NaN
# neutral theory    3    6  3.6  2.4  2.7

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM