简体   繁体   English

如何在某个观察值之上计算观察值?

[英]How to count observations above a certain observation?

Below is a part of my dataset. 以下是我的数据集的一部分。 I have three columns (itemid username and Bidding_Time) and last column(# of prior users) is my objective variable to get. 我有三列(itemid用户名和Bidding_Time),最后一列(先前用户的数量)是我要获取的目标变量。
In each Bidding_Time observation in each itemid , I want to have number of prior users. 在每个itemid的每个Bidding_Time观察值 ,我想要拥有多个先前用户。 In other words, I want to count usernames right above each Bidding_Time value. 换句话说,我想在每个Bidding_Time值的上方计算用户名。 How should I do that? 我应该怎么做?
( some values in # of prior users variable are counted by myself,, I want to fill out that variable) Please help me. (#个先前用户变量中的一些值由我自己计算,我想填写该变量)请帮助我。

 itemid       username           Bidding_Time          # of prior users
 109930                         03FEB15:23:45:02             0
 109930                         04FEB15:21:33:57             0
 109930                         04FEB15:21:42:45             0
 109930       steves22                       
 109930       rubber_c                 
 109930                         04FEB15:22:00:05             2
 109930                         04FEB15:22:00:05             0
 109930                         04FEB15:22:00:05             0
 109930                         04FEB15:22:00:05             0
 109930                         04FEB15:22:00:05             0
 109930                         04FEB15:22:00:05             0
 109930                         04FEB15:22:00:05             0
 109930                         04FEB15:22:00:05             0
 109931                         03FEB15:23:45:22             0
 109931       bacardir                 
 109931                         04FEB15:21:34:30             1
 109931       steves22          04FEB15:21:53:11            ...
 109931       rubber_c                 
 109931                         04FEB15:22:00:35
 109932         ljbinc                 
 109932         ljbinc          04FEB15:00:35:46
 109932         shan              
   ...

dput(head(aa))

    structure(list(itemid = c(109930L, 109930L, 109930L, 109930L, 
109930L, 109930L), username = structure(c(1L, 1L, 1L, 96L, 83L, 
1L), .Label = c("", "734723", "7362", "abcarter", "adnerb", "alikira", 
"allkirk", "ardub", "auctione", "bacardir", "barb70", "beasley", 
"belanger", "beluga", "billygol", "bobwyatt", "buffalo1", "butterfl", 
"bytemong", "camille", "carikas", "carpaw", "cbialobz", "cbx4evr", 
"cdavis", "chiquita", "cinner", "daddygee", "dandelio", "dlt2", 
"doubleea", "e970333", "edinga", "eglass", "fschuld", "gonegolf", 
"lightnin", "lionreen", "ljbinc", "lorac", "lorigala", "mec", 
), class = "factor"), Bidding_Time = structure(c(8L, 145L, 154L, 
1L, 1L, 169L), .Label = c("", "03FEB15:23:19:55", "03FEB15:23:22:13", 
"03FEB15:23:38:48", "03FEB15:23:40:26", "03FEB15:23:43:19", "03FEB15:23:43:39", 
"03FEB15:23:45:02", "03FEB15:23:45:22", "03FEB15:23:46:16", "03FEB15:23:47:43", 
"03FEB15:23:47:57", "03FEB15:23:48:39", "03FEB15:23:52:55", "04FEB15:00:00:09", 
"04FEB15:00:02:41", "04FEB15:00:04:54", "04FEB15:00:06:43", "04FEB15:00:07:27", 
"04FEB15:00:07:54", "04FEB15:00:25:10", "04FEB15:00:25:31", "04FEB15:00:26:48", 
"04FEB15:00:35:46", "04FEB15:00:36:20", "04FEB15:00:36:42", "04FEB15:00:37:32", 
"04FEB15:00:39:01", "04FEB15:00:39:30", "04FEB15:00:39:45", "04FEB15:00:40:17", 
"04FEB15:00:40:42", "04FEB15:00:47:07", "04FEB15:00:47:55", "04FEB15:00:54:04", 
"04FEB15:01:15:37", "04FEB15:09:08:44", "04FEB15:09:43:21", "04FEB15:10:18:51", 
"04FEB15:10:20:44", "04FEB15:10:21:50", "04FEB15:11:11:39", "04FEB15:11:13:54", 
"04FEB15:11:14:41", "04FEB15:11:15:51", "04FEB15:12:04:41", "04FEB15:12:24:11", 
"04FEB15:12:25:24", "04FEB15:12:32:02", "04FEB15:12:33:13", "04FEB15:12:35:42", 
"13FEB15:22:03:55", "13FEB15:22:04:16", "13FEB15:22:04:40", "13FEB15:22:04:57", 
"13FEB15:22:05:29", "13FEB15:22:07:00", "13FEB15:22:07:12", "13FEB15:22:07:34", 
), class = "factor")), .Names = c("itemid", 
"username", "Bidding_Time"), row.names = c(NA, 6L), class = "data.frame")

It is not very efficient: 它不是很有效:

install.packages("dplyr") #only once
library(dplyr)
bb <- aa

bb$temp1 <- (bb$Bidding_Time == "")*1
bb$temp2 <- 1

for(i in 2:dim(bb)[1]){
    if(bb$temp1[i]==bb$temp1[i-1]) {
       bb$temp2[i] <- bb$temp2[i-1]
    } else {
       bb$temp2[i] <- bb$temp2[i-1]+1
    }
}

bb <- bb %>% group_by(itemid, temp2) %>% mutate(Count=cumsum(temp1)) %>% 
  ungroup %>% mutate(Count=lag(Count)) %>% 
  select(itemid, username, Bidding_Time, Count)

bb$Count[is.na(bb$Count)] <- 0

bb %>% View

Getting really close here using rle , but I can't finish it. 使用rle在这里真正靠近,但是我无法完成。 Maybe somebody can pick it up for me... 也许有人可以帮我捡起来...

a <- c("", "", "", "A", "A", "", "", "B", "A", "C", "", "")
b <- a!=""
c <- rep(rle(b)$lengths, rle(b)$lengths)
c2 <- c(NA, c[-length(c)])
> cbind(a,c2)
      a   c2 
 [1,] ""  NA 
 [2,] ""  "3"
 [3,] ""  "3"
 [4,] "A" "3"
 [5,] "A" "2"
 [6,] ""  "2"
 [7,] ""  "2"
 [8,] "B" "2"
 [9,] "A" "3"
[10,] "C" "3"
[11,] ""  "3"
[12,] ""  "2"

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在R中,当每个观察值在不同的变量中时,如何计算观察值 - In R, how to count observations when each observation is in a different variable 如何有条件地计算组中具有特定值的观察值? - How to count observations with certain value in a group conditionally? 如何获得高于特定值的个人和群体的观察百分比? - How to get percentages of observations above a certain value for individuals and groups? 如何为某个变量的每次观察计算负值的数量 - How to count the number of negative values for each observation of a certain variable 在R中,如何返回观察组,而不是观察数量? - In R, how to return the observation groups, not the number of observations? 计算一组中具有特定值的观察值? - Count observations with certain value in a group? data.table count observations close in distance and time of the current observation 统计当前观测距离和时间接近的观测值 - data.table count observations close in distance and time of current observation 计算过去 6 个月的独特观察次数(针对每次观察) - Count number of unique observations last 6 months (for every observation) 如何使用 dplyr 将组中的两个观察结果组合成一个新观察结果 - How do I combine two observations in a group into a new observation with dplyr 如何删除R中当年没有观察到的所有观察结果? - How to remove all observations for which there is no observation in the current year in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM