简体   繁体   中英

How to "Tidy" Quickbooks Journal Data for R Analysis

Problem

If you export Quickbooks Journal data as an Excel file, you get an analyst's nightmare: summarized data without the "roll-up" information. After some data engineering I did know how to do, I'm left with this:

date,transaction_type,num,account,debit,credit
12/01/2019,Bill,4296-4301,Accounts Payable,NA,30734.37
NA,NA,NA,Warehouse:NJ Warehouse Rent,10642.79,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,7476.17,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,2337.86,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,3915.85,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,2878.78,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,3482.92,NA
12/01/2019,Bill,4953268,Accounts Payable,NA,173.8
NA,NA,NA,Warehouse:Warehouse Expense,173.8,NA
12/01/2019,Bill,198288,Accounts Payable,NA,750
NA,NA,NA,Office Expense:Accounting,750,NA

Now I'm left with data engineering I do not know how to do: intelligently fill in all the NAs with the date s, transaction type s, and num s they should roll up to?

The debit and credit will then get "gathered", in tidyverse-speak.

One option is fill and then reshape into 'long' format with pivot_longer

library(dplyr)
library(tidyr)
df1 %>% 
   fill(date, transaction_type, num) %>%
   pivot_longer(cols = debit:credit, 
        names_to = 'type', values_to = 'credit_debit_value')
# A tibble: 22 x 6
#   date       transaction_type num       account                     type   credit_debit_value
#   <chr>      <chr>            <chr>     <chr>                       <chr>               <dbl>
# 1 12/01/2019 Bill             4296-4301 Accounts Payable            debit                 NA 
# 2 12/01/2019 Bill             4296-4301 Accounts Payable            credit             30734.
# 3 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent debit              10643.
# 4 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent credit                NA 
# 5 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent debit               7476.
# 6 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent credit                NA 
# 7 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent debit               2338.
# 8 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent credit                NA 
# 9 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent debit               3916.
#10 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent credit                NA 
# … with 12 more rows

data

df1 <- structure(list(date = c("12/01/2019", NA, NA, NA, NA, NA, NA, 
"12/01/2019", NA, "12/01/2019", NA), transaction_type = c("Bill", 
NA, NA, NA, NA, NA, NA, "Bill", NA, "Bill", NA), num = c("4296-4301", 
NA, NA, NA, NA, NA, NA, "4953268", NA, "198288", NA), account = 
 c("Accounts Payable", 
"Warehouse:NJ Warehouse Rent", "Warehouse:NJ Warehouse Rent", 
"Warehouse:NJ Warehouse Rent", "Warehouse:NJ Warehouse Rent", 
"Warehouse:NJ Warehouse Rent", "Warehouse:NJ Warehouse Rent", 
"Accounts Payable", "Warehouse:Warehouse Expense", "Accounts Payable", 
"Office Expense:Accounting"), debit = c(NA, 10642.79, 7476.17, 
2337.86, 3915.85, 2878.78, 3482.92, NA, 173.8, NA, 750), credit = c(30734.37, 
NA, NA, NA, NA, NA, NA, 173.8, NA, 750, NA)),
 class = "data.frame", row.names = c(NA, 
-11L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM