[英]Adding a column to pandas dataframe conditionally
我正在开展一个个人项目,收集有关 Covid-19 病例的数据。 该数据集仅显示每 state 累计的 Covid-19 病例总数。 我想添加一个包含当天添加的新案例的列。 这是我到目前为止所拥有的:
import pandas as pd
from datetime import date
from datetime import timedelta
import numpy as np
#read the CSV from github
hist_US_State = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")
#some code to get yesterday's date and the day before which is needed later.
today = date.today()
yesterday = today - timedelta(days = 1)
yesterday = str(yesterday)
day_before_yesterday = today - timedelta(days = 2)
day_before_yesterday = str(day_before_yesterday)
#Extracting yesterday's and the day before cases and combine them in one dataframe
yesterday_cases = hist_US_State[hist_US_State["date"] == yesterday]
day_before_yesterday_cases = hist_US_State[hist_US_State["date"] == day_before_yesterday]
total_cases = pd.DataFrame()
total_cases = day_before_yesterday_cases.append(yesterday_cases)
#Adding a new column called "new_cases" and this is where I get into trouble.
total_cases["new_cases"] = yesterday_cases["cases"] - day_before_yesterday_cases["cases"]
你能指出我做错了什么吗?
因为您将total_cases
定义为yesterday_cases
和day_before_yesterday_cases
的串联(通过追加),所以它的行数等于其他两个数据帧的总和。 看起来yesterday_cases
和day_before_yesterday_cases
都有 55 行,因此total_cases
有 110 行。 因此,您的最后一行试图将 55 个值分配给一系列 110 个值。
您可能想要重塑数据以便每个日期都是它自己的列,或者在 arrays 数据帧中工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.