[英]R adding a column based on the value of another column
我正在嘗試轉換具有這樣的數據的data.frame
COMPANY COUNTRY CURRENCY IND1 WGT1 IND2 WGT2 IND3 WGT3
COMP1 USA USD HEALTH .58 RETAIL .42 <NA> 0
COMP2 USA USD AUTO .78 RETAIL .12 TRANSPRT .1
COMP3 CAN CAD SOFTWARE 1 <NA> 0 <NA> 0
我想把它變成
COMPANY COUNTRY CURRENCY HEALTH AUTO SOFTWARE RETAIL TRANSPRT
COMP1 USA USD .58 0 0 .42 0
COMP2 USA USD 0 .78 0 .12 .1
COMP3 CAN CAD 0 0 1 0 0
解決此問題的最佳方法是什么? 在此先感謝BE
我們可以用melt/dcast
從devel
版本data.table
即v1.9.5
。
將'data.frame'轉換為'data.table'( setDT(df1)
)。 來自data.table
的melt
可以采用多個measure
列。 我們將以前綴“ IND”,“ WGT”開頭的列名稱指定為patterns
,然后將“ wide”轉換為“ long”格式。 通過將其分配到NULL刪除“變量”列,然后dcast
從“長”到“寬”指定“value.var”作為“值”。
library(data.table)#v1.9.5+
DT <- melt(setDT(df1), measure=patterns('^IND', 'WGT'),
na.rm=TRUE)[, variable:=NULL]
dcast(DT, ...~value1, value.var='value2', fill=0)
# COMPANY COUNTRY CURRENCY AUTO HEALTH RETAIL SOFTWARE TRANSPRT
#1: COMP1 USA USD 0.00 0.58 0.42 0 0.0
#2: COMP2 USA USD 0.78 0.00 0.12 0 0.1
#3: COMP3 CAN CAD 0.00 0.00 0.00 1 0.0
df1 <- structure(list(COMPANY = c("COMP1", "COMP2", "COMP3"),
COUNTRY = c("USA",
"USA", "CAN"), CURRENCY = c("USD", "USD", "CAD"), IND1 = c("HEALTH",
"AUTO", "SOFTWARE"), WGT1 = c(0.58, 0.78, 1), IND2 = c("RETAIL",
"RETAIL", NA), WGT2 = c(0.42, 0.12, 0), IND3 = c(NA, "TRANSPRT",
NA), WGT3 = c(0, 0.1, 0)), .Names = c("COMPANY", "COUNTRY", "CURRENCY",
"IND1", "WGT1", "IND2", "WGT2", "IND3", "WGT3"), row.names = c(NA,
-3L), class = "data.frame")
這是使用reshape
和xtabs
的默認庫解決方案
long<-reshape(df,sep="",varying=4:9,direction="long")
cbind(df[,1:3],as.data.frame.matrix(xtabs(WGT~COMPANY+IND,long)))
COMPANY COUNTRY CURRENCY AUTO HEALTH SOFTWARE RETAIL TRANSPRT COMP1 COMP1 USA USD 0.00 0.58 0 0.42 0.0 COMP2 COMP2 USA USD 0.78 0.00 0 0.12 0.1 COMP3 COMP3 CAN CAD 0.00 0.00 1 0.00 0.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.