[英]Quick function for Carrying Subsets in R
我想計算數據集中每個作者的協作次數,我的數據就像
第一欄是作者,第二欄是文章ID。 因此,每篇文章都是由一位作者或幾位作者撰寫的。
我使用的代碼基本上是一個循環,
degree1 <- rep(NA, length(Name))
for(i in 1:length(Name)){
temp <- subset(mydata, mydata$data == Name[i])
temp <- subset(mydata, mydata[, 2] %in% temp$artid)
CC <- unique(temp$data)
degree1[i] <- length(CC) - 1
print(i)
}
其中Name是使用的作者向量
Name <- unique(mydata$data)
但是這種循環非常緩慢,因為我的作者數量超過100萬,有什么快速的方法嗎?
library(data.table)
# make dataset
n = 20
set.seed(123)
x = data.table(
author = LETTERS[1:n],
artid = sample.int(n, replace = T)
)
x = x[order(artid)]
# collaborations
x[, n := uniqueN(author), by = artid]
我通讀了評論,我想我知道了您要達到的目的,我創建了一個模擬您的情況的虛擬示例。
library(dplyr)
art_id <- c(11, 11, 11, 10, 10)
author <- c("Ajay","Vijay","Shyam",
"Ajay","Tarun")
uniq_art <- unique(art_id) # get unique article id
因此,在這種情況下,Ajay與三位作者(“ Shyam”,“ Vijay”和“ Tarun”)合作。
Shyam和Vijay分別與兩位作者合作Tarun僅與一位作者合作。 我對您的問題的解決方案不是很好。 希望有人可以提供更優雅的解決方案。
# Make the data frame
publish <- data.frame(art_id, author)
# subset for a particular aritcle ID
# group by author and get the number of authors each author
# has worked with
b <- publish %>% filter(art_id == uniq_art[1])
c <- b %>% group_by(author) %>% summarise(ans = dim(b)[1]-1)
# Repeat the process and join results to above data frame
# for the remaining article IDs
for(i in 2:length(uniq_art)) {
b <- publish %>% filter(art_id == uniq_art[i])
d <- b %>% group_by(author) %>% summarise(ans = dim(b)[1]-1)
c <- full_join(c, d, by = "author")
}
# get the number of columns
nc <- ncol(c)
# sample output after running loop in my dummy case
# A tibble: 4 x 3
author ans.x ans.y
<fctr> <dbl> <dbl>
1 Ajay 2 1
2 Shyam 2 NA
3 Vijay 2 NA
4 Tarun NA 1
# Add all numeric values in each row to get total collaborated authors
total_collab <- rowSums(c[,2:nc], na.rm = T)
final_ans <- c %>% mutate(total = total_collab)
final_ans
# A tibble: 4 x 4
author ans.x ans.y total
<fctr> <dbl> <dbl> <dbl>
1 Ajay 2 1 3
2 Shyam 2 NA 2
3 Vijay 2 NA 2
4 Tarun NA 1 1
希望這可以幫助。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.