简体   繁体   English

如何将元素从 python pandas dataframe 循环到新的嵌套字典?

[英]How to loop through elements from a python pandas dataframe to a new nested dictionary?

I am currently using pandas library to read data from a CSV file.我目前正在使用 pandas 库从 CSV 文件中读取数据。 The data includes a "data" column which consists of 1's and 0's, and a "published_at" column which has unique time and date stamps (I have converted it to become the index of the dataframe).数据包括一个由 1 和 0 组成的“数据”列,以及一个具有唯一时间和日期戳的“published_at”列(我已将其转换为数据帧的索引)。 Click here to see picture of the Dataframe from CSV (I deleted the core_id data as it is irrelevant).单击此处查看来自 CSV 的 Dataframe 的图片(我删除了 core_id 数据,因为它无关紧要)。

In the data, a "1" means yes and "0" means no.在数据中,“1”表示是,“0”表示否。 I would like to analyze the data by looping through the dataframe from a certain start date to an end date (ie 2020-11-26 to 2020-11-27) and count how many times "1" (yes_data) occurred, and how many times "0" (no_data) occurred in each day.我想通过从某个开始日期到结束日期(即 2020-11-26 到 2020-11-27)循环通过 dataframe 来分析数据,并计算“1”(yes_data)出现的次数,以及如何每天多次出现“0”(no_data)。 And from there, I would like to create a new CSV file or dataframe that contains that data so I can analyze it from there.从那里,我想创建一个新的 CSV 文件或 dataframe 包含该数据,以便我可以从那里分析它。

The way I tried to approach this is by creating a nested dictionary and trying to populate it by looping through the main dataframe and counting how many times "yes" and "no" occurred per day.我尝试解决此问题的方法是创建一个嵌套字典并尝试通过循环遍历主 dataframe 并计算每天发生多少次“是”和“否”来填充它。

I would like to end up with a dictionary (or dataframe, csv file, whatever..) that has 3 columns: date (ie 2020-11-26), "yes" count, and "no" count.我想得到一个包含 3 列的字典(或 dataframe、csv 文件等):日期(即 2020-11-26)、“是”计数和“否”计数。

Below is the code I came up with:下面是我想出的代码:

yes_data = 0
no_data = 0
date_id = '2020-11-26'

# Create a dictionary to populate a new dataframe
#new_data = {
#  date_id: {'yes': yes_data, 'no': no_data},
#  "2020-11-27": {'yes': 2, 'no': 2}}
new_data = {"":{}}

# I tried to convert the csv data to a dictionary but I don't know
# if this is necessary so I commented it out
# csv_dict = csv_data.to_dict

csv_dict = csv_data

for day in csv_dict['2020-11-26':'2020-11-27']:
    new_data[day] = csv_dict[day]
    for state in csv_dict['2020-11-26':'2020-11-27'].data:
        if state == 1:
            yes_data += 1
            new_data[day][state] == yes_data
        elif state == 0:
            no_data += 1
            new_data[day][state] == no_data

However the code does not work (I keep getting errors everywhere..).但是代码不起作用(我到处都收到错误..)。 How can I fix it to do what I'm trying to do?我怎样才能修复它来做我想做的事情? Any help is appreciated.任何帮助表示赞赏。 Thank you!谢谢!

PS I'm fairly new to Python, trying my best here! PS我对Python相当陌生,在这里尽我所能!

Hope you're doing well.希望你做得很好。 This snippet will help you do the job!这个片段将帮助你完成这项工作!

result = {}
for index, row in df.iterrows(): # Iterates over the row
    date = row['published_at'].split(' ')[0]  # This line takes only the date of the row ( not hour and minute ...)
    ans = row['data']  # Finds if the data is zero or one
    if date not in result:  # Creates an entery for this date if it hasn't created yet
        result[date] = {'yes':0, 'no':0}
    if ans: # Increases the number of yes if ans == 1
        result[date]['yes']+=1 
    else:  # Increases the number of yes if ans == 0
        result[date]['no'] +=1

Keep in mind that df is your data frame and row['data'] means the 0 or 1. So if you have different names, change it.请记住, df是您的数据框,而 row['data'] 表示 0 或 1。因此,如果您有不同的名称,请更改它。 At the end of this code, you're gonna have a dictionary with the structure as mentions in the following.在此代码的末尾,您将拥有一个字典,其结构如下所述。

result = {'2020-12-01': {'yes': 2, 'no': 4}, '2020-12-4': {'yes':n, 'no':m} }

Have a good day祝你有美好的一天

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM