簡體   English   中英

R - 條件滯后 - 如何在滿足條件之前滯后一定數量的細胞?

[英]R - Conditional lagging - How to lag a certain amount of cells until a condition is met?

一直試圖解決這個問題幾個星期,但似乎無法得到它。

我有以下數據框

    post_id user_id
1    post-1   user1
2    post-2   user2
3 comment-1   user1
4 comment-2   user3
5 comment-3   user4
6    post-3   user2
7 comment-4   user2

並希望創建一個新的變量parent_id 因此,對於每次觀察,它應執行以下步驟:

  1. 檢查post_idpost還是comment
  2. 如果post_idpost那么parent_id應該等於整個數據幀的最早post_id
  3. 如果post_id是第一個帖子,那么parent_id應該等於NA
  4. 如果post_idcomment那么parent_id應該等於它遇到的第一個post_id

輸出應該類似於:

    post_id user_id parent_id_man
1    post-1   user1            NA
2    post-2   user2        post-1
3 comment-1   user1        post-2
4 comment-2   user3        post-2
5 comment-3   user4        post-2
6    post-3   user2        post-1
7 comment-4   user2        post-3

我嘗試過以下方法:

#Prepare data
df <- df %>% separate(post_id, into=c("type","number"), sep="-", remove=FALSE)
df$number <- as.numeric(df$number)
df <- df %>% mutate(comment_number = ifelse(type == "comment",number,99999))
df <- df %>% mutate(post_number = ifelse(type == "post",number,99999))

#Create parent_id column
df <- df %>% mutate(parent_id = ifelse(type == "post",paste("post-",min(post_number), sep=""),0))
df <- df %>% mutate(parent_id = ifelse(parent_id == post_id,"NA",parent_id))
df <- df %>% select(-comment_number, -post_number)

使用該代碼,我可以執行步驟1,2和3 ,但步驟4超出了我的范圍。 我覺得某種類型的條件滯后應該能夠解決它,但不能想出如何做到這一點。

任何想法將非常感謝!

以您的解決方案為基礎

x <- which(df$type == 'post')
z <- which(df$type == 'comment')
df$parent_id[df$parent_id == 0] <- df$post_id[x[sapply(z, function(i) findInterval(i, x))]]
df$parent_id
#[1] "NA"     "post-1" "post-2" "post-2" "post-2" "post-1" "post-3"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM