[英]fill column with value of a column from another dataframe, depending on conditions
我有一個看起來像這樣的 dataframe(我的 COVID 病例輸入數據庫)
數據:
date state cases
0 20200625 NY 300
1 20200625 CA 250
2 20200625 TX 200
3 20200625 FL 100
5 20200624 NY 290
6 20200624 CA 240
7 20200624 TX 100
8 20200624 FL 80
...
值得注意的是,上述數據中的“日期”列是一個數字(不是日期時間)
我想讓它成為這樣的時間序列(期望的輸出),日期作為索引,每個州的 COVID 案例作為列
NY CA TX FL
20200625 300 250 200 100
20200626 290 240 100 80
...
截至目前,我設法使用以下代碼僅創建了 output 的架構
states = ['NY', 'CA', 'TX', 'FL']
days = [20200625, 20200626]
columns = states
positives = pd.DataFrame(columns = columns)
i = 0
for day in days:
positives.loc[i, "date"] = day
i = i +1
positives.set_index('date', inplace=True)
positives= positives.rename_axis(None)
print(positives)
返回:
NY CA TX FL
20200625.0 NaN NaN NaN NaN
20200626.0 NaN NaN NaN NaN
在以下情況下,我如何從“數據”dataframe 中獲取“案例”列的值:
(i) data["state"] 中的值 = "positives" 的 header 列,
(ii) data["date"] 中的值 = "positives" 的行索引
你可以做:
df = df.set_index(['date', 'state']).unstack().reset_index()
# fix column names
df.columns = df.columns.get_level_values(1)
state CA FL NY TX
0 20200624 240.0 NaN 290.0 NaN
1 20200625 250.0 100.0 300.0 200.0
稍后,要再次設置索引,我們需要顯式設置名稱,請執行以下操作:
df = df.set_index("")
df.index.name = "date"
您感興趣的轉換稱為 pivot。 您可以在 Pandas 中實現這一點,如下所示:
# Reproduce part of the data
data = pd.DataFrame({'date': [20200625, 20200625, 20200624, 20200624],
'state': ['NY', 'CA', 'NY', 'CA'],
'cases': [300, 250, 290, 240]})
data
# date state cases
# 0 20200625 NY 300
# 1 20200625 CA 250
# 2 20200624 NY 290
# 3 20200624 CA 240
# Pivot
data.pivot(index='date', columns='state', values='cases')
# state CA NY
# date
# 20200624 240 290
# 20200625 250 300
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.