[英]R: How to improve performance of grepl in apply function within dataframe
[英]How to apply/loop function, in this case chull(), to groups within a dataframe in r?
我有一個這樣的數據框:
x2 <- c(12,-10,-3,-5,-3, 18,-14,-3,-13,14,12,-10,-3,-5,-3, 18,-14,-3,-13,14)
y2 <- c(-3,-4,-11,-12,-13,-4,5,-10,-3,6,-3,-4,-11,-12,-13,-4,5,-10,-3,6)
ID2 <- c(5088,5088,5088,5088,5088,5088,5088,5088,5088,5088,6000,6000,6000,6000,6000,6000,6000,6000,6000,6000)
D2 <- c(59,49,70,40,74,78,90,55,65,73,59,49,70,40,74,78,90,55,65,73)
Code2 <- c(110,110,110,130,110,110,110,110,110,100,110,110,110,130,110,110,110,110,110,100)
df2 <- data.frame(x2,y2,ID2,D2,Code2)
df2
x2 y2 ID2 D2 Code2
1 12 -3 5088 59 110
2 -10 -4 5088 49 110
3 -3 -11 5088 70 110
4 -5 -12 5088 40 130
5 -3 -13 5088 74 110
6 18 -4 5088 78 110
7 -14 5 5088 90 110
8 -3 -10 5088 55 110
9 -13 -3 5088 65 110
10 14 6 5088 73 100
11 12 -3 6000 59 110
12 -10 -4 6000 49 110
13 -3 -11 6000 70 110
14 -5 -12 6000 40 130
15 -3 -13 6000 74 110
16 18 -4 6000 78 110
17 -14 5 6000 90 110
18 -3 -10 6000 55 110
19 -13 -3 6000 65 110
20 14 6 6000 73 100
...
x
和y
是樹木集合中樹木的笛卡爾坐標。 ID
是這些樂團的個人標識。 Code
和D
是尚不相關的參數。
現在,我嘗試將功能chull()
應用於每個合奏,以獲取一個data.frame
,該數據僅由構成合奏邊界的那些樹組成。 所有ID都類似這樣:
x1 <- c(12,-10,-3,-5,-3, 18,-14,-3,-13,14)
y1 <- c(-3,-4,-11,-12,-13,-4,5,-10,-3,6)
ID1 <- c(5088,5088,5088,5088,5088,5088,5088,5088,5088,5088)
D1 <- c(59,49,70,40,74,78,90,55,65,73)
Code1 <- c(110,110,110,130,110,110,110,110,110,100)
df1 <- data.frame(x1,y1,ID1,D1,Code1)
hullpts <- chull(df1)
df1[hullpts,]
x1 y1 ID1 D1 Code1
6 18 -4 5088 78 110
5 -3 -13 5088 74 110
4 -5 -12 5088 40 130
9 -13 -3 5088 65 110
7 -14 5 5088 90 110
10 14 6 5088 73 100
我一直在嘗試用for()
和nlme::gapply()
創建循環, nlme::gapply()
沒有成功。
我將非常感謝您的幫助。
我不知道您對外部軟件包有多熟悉,但是對於data.table,它將是一個簡單的函數,例如:
library(data.table)
#group by ID2 and then apply chull to x and y in each group
#.SD just references the groups created by grouping by ID2
#setDT converts df2 to data.table
setDT(df2)[, .SD[chull(x2, y2),], by = 'ID2']
# ID2 x2 y2 D2 Code2
#1: 5088 18 -4 78 110
#2: 5088 -3 -13 74 110
#3: 5088 -5 -12 40 130
#4: 5088 -13 -3 65 110
#5: 5088 -14 5 90 110
#6: 5088 14 6 73 100
#7: 6000 18 -4 78 110
#8: 6000 -3 -13 74 110
#9: 6000 -5 -12 40 130
#10: 6000 -13 -3 65 110
#11: 6000 -14 5 90 110
#12: 6000 14 6 73 100
或者,如果您想使用基數R,則可能需要執行以下操作:
splits <- split(df2, df2$ID2)
chulls <-
lapply(splits, function(x) {
x[chull(x$x2, x$y2)]
})
do.call(rbind, chulls)
# x2 y2 ID2 D2 Code2
#1: 18 -4 5088 78 110
#2: -3 -13 5088 74 110
#3: -5 -12 5088 40 130
#4: -13 -3 5088 65 110
#5: -14 5 5088 90 110
#6: 14 6 5088 73 100
#7: 18 -4 6000 78 110
#8: -3 -13 6000 74 110
#9: -5 -12 6000 40 130
#10: -13 -3 6000 65 110
#11: -14 5 6000 90 110
#12: 14 6 6000 73 100
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.