[英]How to loop through the columns in an R data frame and create a new data frame using the column name in each iteration?
[英]How can I create new data frame columns corresponding to levels of a given column using the plyr package in R?
我正在尋找一種“優雅”的方法,基本上是按一個列變量的級別拆分數據幀,然后創建一個經過重塑的新輸出數據幀,以便現在刪除因子變量並為因子變量的級別添加新列。 我可以使用split()方法之類的功能來執行此操作,但這對我來說似乎是一團糟。 我一直在嘗試使用plyr包中的melt()和cast()函數來執行此操作,但是未能成功獲取所需的確切輸出。
這是我的數據:
> jumbo.df = read.csv(...)
> head(jumbo.df)
PricingDate Name Rate
186 2012-03-05 Type A 2.875
187 2012-03-05 Type B 3.250
188 2012-03-05 Type C 3.750
189 2012-03-05 Type D 3.750
190 2012-03-05 Type E 4.500
191 2012-03-06 Type A 2.875
, remove and , then output columns for , , , , and with the corresponding Rate series with Date as ID: 我想做的是按變量拆分,刪除和 ,然后輸出 , , , 和 ,並以Date為ID的相應Rate系列:
> head(output.df)
PricingDate Type A Type B Type C Type D Type E
2012-03-05 2.875 3.250 3.750 3.750 4.500
2012-03-06 2.875 ...
謝謝!
不知道我是否正確,但是可能只是要將數據重塑為寬格式嗎? 如果是這樣,你必須使用melt
和cast
的功能reshape
(!)封裝。 reshape2
基本相同。 由於您的數據已經采用混合格式(即長格式),因此單線即可滿足您的要求:
df <- read.table(textConnection("PricingDate Name Rate
2012-03-05 TypeA 2.875
2012-03-05 TypeB 3.250
2012-03-05 TypeC 3.750
2012-03-05 TypeD 3.750
2012-03-05 TypeE 4.500
2012-03-06 TypeA 2.875"), header=TRUE, row.names=NULL)
library(reshape2)
dcast(df, PricingDate ~ Name)
Using Rate as value column: use value.var to override.
PricingDate TypeA TypeB TypeC TypeD TypeE
1 2012-03-05 2.875 3.25 3.75 3.75 4.5
2 2012-03-06 2.875 NA NA NA NA
library(plyr)
library(reshape2)
data <- structure(list(PricingDate = c("2012-03-05", "2012-03-05", "2012-03-05",
"2012-03-05", "2012-03-05", "2012-03-06", "2012-03-06", "2012-03-06",
"2012-03-06", "2012-03-06"), Name = c("Type A", "Type B", "Type C",
"Type D", "Type E", "Type A", "Type B", "Type C", "Type D", "Type E"
), Rate = c(2.875, 3.25, 3.75, 3.75, 4.5, 4.875, 5.25, 6.75,
7.75, 8.5)), .Names = c("PricingDate", "Name", "Rate"), class = "data.frame", row.names = c("186",
"187", "188", "189", "190", "191", "192", "193", "194", "195"
))
> data
PricingDate Name Rate
186 2012-03-05 Type A 2.875
187 2012-03-05 Type B 3.250
188 2012-03-05 Type C 3.750
189 2012-03-05 Type D 3.750
190 2012-03-05 Type E 4.500
191 2012-03-06 Type A 4.875
192 2012-03-06 Type B 5.250
193 2012-03-06 Type C 6.750
194 2012-03-06 Type D 7.750
195 2012-03-06 Type E 8.500
ddply(data, .(PricingDate), function(x) reshape(x, idvar="PricingDate", timevar="Name", direction="wide"))
PricingDate Rate.Type A Rate.Type B Rate.Type C Rate.Type D
1 2012-03-05 2.875 3.25 3.75 3.75
2 2012-03-06 4.875 5.25 6.75 7.75
Rate.Type E
1 4.5
2 8.5
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.