[英]Problems while creating a two column based index in a new pandas column?
[英]Creating new variables based on two columns as index one column as new variable names python pandas or R
如果您在閱讀問題后有更好的措辭,請幫我編輯標題。
我的數據看起來像這樣:
Location Date Item Price
12 1 A 1
12 2 A 2
12 3 A 4
13 1 A 1
13 2 A 4
12 1 B 1
12 2 B 8
13 1 B 1
13 2 B 2
13 3 B 11
我想使用位置和日期為每個項目創建一個新變量,即項目價格,例如,我想要的輸出是:
Location Date PriceA PriceB
12 1 1 1
12 2 2 8
12 3 4 NaN
13 1 1 1
13 2 4 2
13 3 NaN 11
你可以嘗試從base R
reshape
reshape(df, idvar=c('Location', 'Date'), timevar='Item', direction='wide')
# Location Date Price.A Price.B
#1 12 1 1 1
#2 12 2 2 8
#3 12 3 4 NA
#4 13 1 1 1
#5 13 2 4 2
#10 13 3 NA 11
要么
library(reshape2)
dcast(df, Location+Date~paste0('Price',Item), value.var='Price')
# Location Date PriceA PriceB
#1 12 1 1 1
#2 12 2 2 8
#3 12 3 4 NA
#4 13 1 1 1
#5 13 2 4 2
#6 13 3 NA 11
或者你可以在轉換為data.table
后使用dcast.data.table
(會更快)
library(data.table)
dcast.data.table(setDT(df)[,Item:=paste0('Price', Item)],
...~Item, value.var='Price')
要么
library(tidyr)
library(dplyr)
spread(df, Item, Price) %>%
rename(PriceA=A, PriceB=B)
# Location Date PriceA PriceB
#1 12 1 1 1
#2 12 2 2 8
#3 12 3 4 NA
#4 13 1 1 1
#5 13 2 4 2
#6 13 3 NA 11
如果您不需要Price
作為前綴,只需執行以下操作:
dcast.data.table(setDT(df), ...~Item, value.var='Price')
和reshape2
選項將是
dcast(df,...~Item, value.var='Price')
df <- structure(list(Location = c(12L, 12L, 12L, 13L, 13L, 12L, 12L,
13L, 13L, 13L), Date = c(1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 2L,
3L), Item = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"
), Price = c(1L, 2L, 4L, 1L, 4L, 1L, 8L, 1L, 2L, 11L)), .Names = c("Location",
"Date", "Item", "Price"), class = "data.frame", row.names = c(NA,
-10L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.