[英]Complex Data Frame calculations in R
我目前正在導入兩個這樣顯示的表(以最基本的形式)
Table 1
State Month Account Value
NY Jan Expected Sales 1.04
NY Jan Expected Expenses 1.02
Table 2
State Month Account Value
NY Jan Sales 1,000
NY Jan Customers 500
NY Jan F Expenses 1,000
NY Jan V Expenses 100
我的最終目標是創建一個包含前兩行值的第三個數據框,並根據函數計算第四列
NextYearExpenses = (t2 F Expenses + t2 V Expenses)* t1 Expected Expenses
NextYearSales = (t2 sales) * t1 Expected Sales
所以我想要的輸出如下
State Month New Account Value
NY Jan Sales 1,040
NY Jan Expenses 1,122
我對 R 比較陌生,我認為 ifelse 語句可能是我最好的選擇。 我嘗試合並表格並使用簡單的列函數進行計算,但沒有真正的進展。
有什么建議?
您可能需要進行一些數據整理,但沒什么特別的
require(dplyr)
Table1<-tibble(State=c("NY","NY"), Month=c("Jan","Jan"), Account=c("Expected Sales", "Expected Expenses"), Value=c(1.04,1.02))
Table2<-tibble(State=c("NY","NY","NY","NY"), Month=c("Jan","Jan","Jan","Jan"), Account=c("Sales", "Customers", "F Expenses","V Expenses"), Value=c(1000,500,1000,100))
我做的第一件事是將帳戶重命名為通用名稱,即費用,這將幫助我稍后合並到 Table1
Table2$Account[Table2$Account=="F Expenses"]<-"Expenses"
Table2$Account[Table2$Account=="V Expenses"]<-"Expenses"
然后我使用 group_by 函數並按 State、Month 和 Account 分組並計算總和
Table2 <- Table2 %>% group_by(State, Month,Account) %>%
summarise(Tot_Value=sum(Value)) %>% ungroup()
head(Table2)
## State Month Account Tot_Value
## <chr> <chr> <chr> <dbl>
## 1 NY Jan Customers 500
## 2 NY Jan Expenses 1100
## 3 NY Jan Sales 1000
然后類似於表 1 中帳戶的重命名
Table1$Account[Table1$Account=="Expected Sales"]<-"Sales"
Table1$Account[Table1$Account=="Expected Expenses"]<-"Expenses"
合並到第三個表,表 3
Table3<- left_join(Table1,Table2)
使用 mutate 來做需要的操作
Table3 <- Table3 %>% mutate(Value2=Value*Tot_Value)
head(Table3)
## # A tibble: 2 x 6
## State Month Account Value Tot_Value Value2
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 NY Jan Sales 1.04 1000 1040
## 2 NY Jan Expenses 1.02 1100 1122
這是我對dplyr
和tidyr
。 首先,我將您的初始表與rbind
成一個長格式表。 由於每個 Account 值都有唯一標識符,因此它們不需要是單獨的表。 接下來我group_by
State 和 Month 將這些分組,假設最終您將擁有各種狀態/月份。 接下來,我根據您指定的 Account 值進行summarise
,並創建了兩個新列。 最后,為了得到你想要的長格式,我使用了從tidyr
gather
tidyr
格式到長格式。 您可以通過在%>%
之后刪除來將這些命令分成更小的塊,以便更好地了解每個步驟的作用。
library(dplyr)
library(tidyr)
rbind(df,df2) %>%
group_by(State,Month) %>%
summarise(Expenses = (Value[which(Account == "F Expenses")] + Value[which(Account == "V Expenses")]) * Value[which(Account == "Expected Expenses")],
Sales = Value[which(Account == "Sales")] * Value[which(Account == "Expected Sales")]) %>%
gather(New_Account,Value, c(Expenses,Sales))
# A tibble: 2 x 4
# Groups: State [1]
# State Month New_Account Value
# <chr> <chr> <chr> <dbl>
#1 NY Jan Expenses 1122
#2 NY Jan Sales 1040
我建議查看“整潔數據”的概念,因為使用您當前擁有的結構處理數據存在一些真正的挑戰。 例如,創建 t3 應該只需要 2-3 行代碼,所有這些只是為了解決您的數據架構:
library(tidyverse)
t1 <- data.frame(State = rep("NY", 2),
Month = rep(as.Date("2018-01-01"), 2),
Account = c("Expected Sales", "Expected Expenses"),
Value = c(1.04, 1.02),
stringsAsFactors = FALSE)
t2 <- data.frame(State = rep("NY", 4),
Month = rep(as.Date("2018-01-01"), 4),
Account = c("Sales", "Customers", "F Expenses", "V Expenses"),
Value = c(1000, 500, 1000, 100),
stringsAsFactors = FALSE)
t3 <- t2 %>%
spread(Account, Value) %>%
inner_join({
t1 %>%
spread(Account, Value)
}, by = c("State" = "State", "Month" = "Month")) %>%
mutate(NewExpenses = (`F Expenses` + `V Expenses`) * `Expected Expenses`,
NewSales = Sales * `Expected Sales`) %>%
select(State, Month, Sales = NewSales, Expenses = NewExpenses) %>%
gather(Sales, Expenses, key = `New Account`, value = Value)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.