[英]How to replace for loop making processing faster?
我有很多文件要處理。 數據如下:
V2 V3 V4
1 ID_0071817 1 1
2 1 201912312200+00 0.36
3 2 201912312300+00 0.36
4 3 202001010000+00 0.38
5 ID_0089011 1 1.00
6 1 202001010200+00 0.36
我現在做的是:
for(j in 1:nrow(data)) { if (data[j,2] == "1") {ID<-data[j,1]}
data[j,4] <- ID
}
產生:
V2 V3 V4 V4.1
1 ID_0071817 1 1 ID_0071817
2 1 201912312200+00 0.36 ID_0071817
3 2 201912312300+00 0.36 ID_0071817
4 3 202001010000+00 0.38 ID_0071817
5 ID_0089011 1 1.00 ID_0089011
6 1 202001010200+00 0.36 ID_0089011
問題是這是處理整個數據的方式太慢了。 單個文件需要 5 分鍾,我得到了幾千個。
我們可以像下面這樣嘗試cumsum
transform(
df,
V4.1 = V2[V3 == 1][cumsum(V3 == 1)]
)
這使
V2 V3 V4 V4.1
1 ID_0071817 1 1.00 ID_0071817
2 1 201912312200+00 0.36 ID_0071817
3 2 201912312300+00 0.36 ID_0071817
4 3 202001010000+00 0.38 ID_0071817
5 ID_0089011 1 1.00 ID_0089011
6 1 202001010200+00 0.36 ID_0089011
> dput(df)
structure(list(V2 = c("ID_0071817", "1", "2", "3", "ID_0089011",
"1"), V3 = c("1", "201912312200+00", "201912312300+00", "202001010000+00",
"1", "202001010200+00"), V4 = c(1, 0.36, 0.36, 0.38, 1, 0.36)), class = "data.frame", row.names
= c("1",
"2", "3", "4", "5", "6"))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.