简体   繁体   English

R编码:如何仅使用4个完整的四分之一数据来保存记录,以及如何在多个条件下获取条件和

[英]R coding: How to keep records with only 4 complete quarters of data and how to take a conditional sum with multiple conditions

I have a dataframe with company quarterly data, and I have two questions: 我有一个包含公司季度数据的数据框,我有两个问题:

1: How can I keep records for only those companies with four quarters of data (as companies sometimes appear with 1, 2 or 3 quarters of data but I need 4 quarters for each company across the entire dataframe) 1:如何仅为那些拥有四分之三数据的公司保留记录(因为公司有时会出现1,2或3个季度的数据,但我需要在整个数据框架中为每个公司提供4个季度)

2: Because I have quarterly data, I would like to take the annual average or sum (based on variable type) across all four quarters given 2 conditions: year and company. 2:因为我有季度数据,所以我想在所有四个季度中采用年度平均值或总和(基于变量类型)给出两个条件:年份和公司。

For instance, Company i in 1984 would have an average inventory value and total revenue, say I1984 and REV1984 respectively, based on four distinct quarterly values. 例如,根据四个不同的季度值,1984年的公司i将分别具有平均库存价值和总收入,分别为I1984和REV1984。 I am currently using these lines of code - for the mean and sum - but R keeps returning "NA" and I've searched and searched for alternatives but nothing seems to work: 我目前正在使用这些代码行 - 对于均值和求和 - 但R保持返回“NA”并且我搜索并搜索了替代方案,但似乎没有任何工作:

Company i_ I1984 <- with(R, mean(I [FY == "1984" & Co == "AAR CORP" ]))
Company i_ REV1984 <- with(R, sum(REVQ [FY == "1984" & Co == "AAR CORP" ]))

R is my dataframe I <- quarterly inventory REVQ <- quarterly revenue R是我的数据框I < - 季度库存REVQ < - 季度收入

Clearly, the values in quotes will be made dynamic as I find each new average/sum value and place in a new data.frame . 显然,当我找到每个新的平均值/总和并放置在新的data.frame ,引号中的值将变为动态。

Any help would be highly appreciated. 任何帮助将受到高度赞赏。 Thank you 谢谢

I've included an example code below: 我在下面列出了一个示例代码:

company<-c("xray", "xray", "xray",  "xray", "foxrot", "foxrot", "delta",     "kilo", "kilo" )  
qtr <-c("1","2","3","4", "1", "2","4", "2","3")

IQ <- rnorm(9,0,10)  
REVQ <- rnorm(9,0,10)  
AssetQ <- rnorm(9,0,10)  
CashQ  <- rnorm(9,0,10)  

#Modified dataframe  
data<-data.frame( company, qtr, IQ, REVQ, AssetQ, CashQ )

In this example 'xray' should be the only company for which we take a mean/sum. 在这个例子中,'xray'应该是我们采用均值/总和的唯一公司。

For your first question (with your df structure from the comments below): 对于您的第一个问题(使用以下评论中的df结构):

company<-c("xray", "xray", "xray", "xray", "foxrot", "foxrot", "delta", "kilo", "kilo" )  
qtr <-c("1","2","3","4", "1", "2","4", "2","3")  
IQ <- rnorm(9,0,10)  
REVQ <- rnorm(9,0,10)  
AssetQ <- rnorm(9,0,10)  
CashQ <- rnorm(9,0,10)  
#Modified dataframe  
data<-data.frame(company,qtr, IQ, REVQ, AssetQ, CashQ )


#Using the dplyr package:  
data.complete<-data.frame(data %>% group_by(company) %>% filter(n() == 4))

For your second question 对于你的第二个问题

#Get your sum and means (note that the 'by' command will separate the sums based on the company factor when you have more than 1 company with complete data)  
aggregate(data.complete[,3:6], by=list(data.complete$company), sum)
aggregate(data.complete[,3:6], by=list(data.complete$company), mean)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM