简体   繁体   English

R编码:如何保留4个完整季度的数据记录

[英]R coding: How to keep records with 4 complete quarters of data

I have a dataframe with company quarterly data and have this question: 我有一个包含公司季度数据的数据框,并且有以下问题:

How can I retain records for only those companies with 4 quarters of data (as companies sometimes appear with 1, 2 or 3 quarters of data but I need 4 complete quarters for each company across the entire dataframe) 我该如何仅保留那些拥有四分之四数据的公司的记录(因为公司有时会出现具有一,二或三分之四的数据,但是在整个数据框架中每家公司我需要四个完整的季度)

I've included example R code below: 我在下面提供了示例R代码:

company<-c("xray", "xray", "xray",  "xray", "foxrot", "foxrot", "delta", "kilo", "kilo", "kilo", "kilo", "kilo", "kilo" )  

year <-c("1984","1984","1984","1984", "1985", "1985","1986", "1987","1988","1989","1989","1989","1989" )

qtr <-c("1","2","3","4", "1", "2","3", "4","1", "1","2","3","4")

IQ <- rnorm(13,0,10)  
REVQ <- rnorm(13,0,10)  
AssetQ <- rnorm(13,0,10)  
CashQ  <- rnorm(13,0,10)  

#Show dataframe  
data<-data.frame( year, qtr, company, IQ, REVQ, AssetQ, CashQ )

In this example 'xray' in 1984 and 'kilo' in 1989 should be the only companies remaining in the new dataframe. 在此示例中,1984年的“ xray”和1989年的“ kilo”应该是新数据框中剩下的唯一公司。 This example is unique in that you will notice the quarterly sequence 1-2-3-4 appears three times but only two of those sequences are good as the other (at rows 5-8) is a random occurrence. 这个示例是独特的,您会发现季度序列1-2-3-4出现了3次,但是这些序列中只有两个是好的,而另一个序列(在5-8行)是随机出现的。 For the clean up to make sense each sequence 1-2-3-4 needs to be assigned to the same year and company. 为了使清理有意义,需要将每个序列1-2-3-4分配给同一年和同一公司。

This condition makes the task fairly tricky (at least for myself) and I've tried for nearly a day, searching the web and trying different methods, but nothing seems to work properly. 这种情况使任务相当棘手(至少对我自己而言),我已经尝试了将近一天,在网络上搜索并尝试了不同的方法,但似乎没有任何工作正常。

Thus, I'm kindly reaching out for some help. 因此,我很乐意寻求帮助。

Thank you~ M 谢谢〜M

Here is the code that can help you: 这是可以帮助您的代码:

library(data.table)
data<-data.table( year, qtr, company, IQ, REVQ, AssetQ, CashQ)
fullyr <- data[,.(len=.N),by=.(year)][len == 4]
data <- data[year %in% fullyr$year]

Requesting you to provide the code you have tried for solution next time :) 要求您下次提供尝试使用的代码:)

The following code can help you .... 以下代码可以帮助您....

final=data.frame()
for(i in unique(data$company)){
temp=data[data$company==i,]
for(j in unique(temp$year)){
    if(nrow(temp[temp$year==j,])==4)
        final=rbind(final,data.frame(company=i,Year=j))
}
}

'final' dataframe will contain your required fields. “最终”数据框将包含您的必填字段。

We can use data.table 我们可以使用data.table

library(data.table)
setDT(data)[data[, .I[uniqueN(qtr)==4], by = company]$V1]

Or 要么

setDT(data)[, if(uniqueN(qtr)==4) .SD, by = company]
#   company year qtr         IQ       REVQ      AssetQ       CashQ
#1:    xray 1984   1  -5.827832   8.221870   9.6688477 -10.6321121
#2:    xray 1984   2   3.521643  -1.096940  -4.5014798  -0.9196087
#3:    xray 1984   3  -7.526160  -4.155428 -10.6556271   7.6872401
#4:    xray 1984   4  -7.255974   3.717738  -1.7913910   9.6325437
#5:    kilo 1989   1   2.252885 -19.238773   9.7476758   4.0115274
#6:    kilo 1989   2   9.018055 -12.411381  -0.3772812   6.8339812
#7:    kilo 1989   3 -12.221085 -13.040805   7.3529403   9.1510647
#8:    kilo 1989   4   2.088668  -7.753041   1.5701738 -11.2252986

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM