简体   繁体   中英

Convert several columns to unique one and match it with other columns information

I´m working with a complex matrix (complex to me...)

It is something like this:

      Invoice.1   Invoice.2   Invoice.3               mtime
1   21605000182 21605000183          NA 2017-01-16 19:51:33
2   21605000182 21605000183          NA 2017-01-16 19:51:33
3   21605000182 21605000183          NA 2017-01-16 19:51:33
4   21605000182 21605000183          NA 2017-01-16 19:51:33
5   21510000669 21602000125 21608000366 2017-01-20 13:28:36
6   21609000856          NA          NA 2017-01-20 13:28:36
7   21606000405 21608000354 21608000356 2017-01-20 13:28:36
8   21610000133          NA          NA 2017-01-20 13:28:36
9   21604000592 21605000604 21605000608 2017-01-20 13:28:36
10  21609001012          NA          NA 2017-01-20 13:28:36

I would like to convert all the Invoice columns to one, in order to clean up the "NA" and duplicated, but respecting the match of each one with the date of the last columns, which is the date of the claiming.

Something like that:

      Invoice          mtime
1   21605000182 2017-01-16 19:51:33
2   21605000182 2017-01-16 19:51:33
3   21605000182 2017-01-16 19:51:33
4   21605000182 2017-01-16 19:51:33
5   21510000669 2017-01-20 13:28:36
6   21609000856 2017-01-20 13:28:36
7   21606000405 2017-01-20 13:28:36
8   21610000133 2017-01-20 13:28:36
9   21604000592 2017-01-20 13:28:36
10  21609001012 2017-01-20 13:28:36
11  21605000183 2017-01-16 19:51:33
12  21605000183 2017-01-16 19:51:33
13  21605000183 2017-01-16 19:51:33
14  21605000183 2017-01-16 19:51:33
15  21602000125 2017-01-20 13:28:36
16  21608000354 2017-01-20 13:28:36

Example using data.table : (should be faster then using other salutations)

DT <- data.table(Invoice.1 = 1:3, Invoice.2 = c(1L,4L,5L), mtime = 11:13)
DT

   Invoice.1 Invoice.2 mtime
1:         1         1    11
2:         2         4    12
3:         3         5    13

rez <- melt(DT, measure.vars = paste0("Invoice.", 1:2),
            value.name = "Invoice")
rez[, variable := NULL]
rez

   mtime Invoice
1:    11       1
2:    12       2
3:    13       3
4:    11       1
5:    12       4
6:    13       5

rez <- unique(rez)
rez

   mtime Invoice
1:    11       1
2:    12       2
3:    13       3
4:    12       4
5:    13       5

Using the gather function for the tidyr package can do what you are looking for. gather will transform a data.frame from wide format to long format.

library(tidyr)
library(readr)

# Create a temp file to store the example data
data_file <- tempfile()

cat(
"Invoice.1,Invoice.2,Invoice.3,mtime
21605000182,21605000183,NA,2017-01-16 19:51:33
21605000182,21605000183,NA,2017-01-16 19:51:33
21605000182,21605000183,NA,2017-01-16 19:51:33
21605000182,21605000183,NA,2017-01-16 19:51:33
21510000669,21602000125,21608000366,2017-01-20 13:28:36
21609000856,NA,NA,2017-01-20 13:28:36
21606000405,21608000354,21608000356,2017-01-20 13:28:36
21610000133,NA,NA,2017-01-20 13:28:36
21604000592,21605000604,21605000608,2017-01-20 13:28:36
21609001012,NA,NA,2017-01-20 13:28:36",
file = data_file,
append = FALSE)

# Read the data from the temp file into a data.frame called `invoices`
invoices <-
  readr::read_csv(file = data_file, col_types = "cccT")

# View the data
invoices
# # A tibble: 10 x 4
#      Invoice.1   Invoice.2   Invoice.3               mtime
#          <chr>       <chr>       <chr>              <dttm>
#  1 21605000182 21605000183        <NA> 2017-01-16 19:51:33
#  2 21605000182 21605000183        <NA> 2017-01-16 19:51:33
#  3 21605000182 21605000183        <NA> 2017-01-16 19:51:33
#  4 21605000182 21605000183        <NA> 2017-01-16 19:51:33
#  5 21510000669 21602000125 21608000366 2017-01-20 13:28:36
#  6 21609000856        <NA>        <NA> 2017-01-20 13:28:36
#  7 21606000405 21608000354 21608000356 2017-01-20 13:28:36
#  8 21610000133        <NA>        <NA> 2017-01-20 13:28:36
#  9 21604000592 21605000604 21605000608 2017-01-20 13:28:36
# 10 21609001012        <NA>        <NA> 2017-01-20 13:28:36

# use the gather function from the tidyr package to transform the data from the
# wide format to a long format.

tidyr::gather(invoices, key = key, value = Invoice, -mtime, na.rm = TRUE) %>% print(n = Inf)
# # A tibble: 20 x 3
#                  mtime       key     Invoice
#  *              <dttm>     <chr>       <chr>
#  1 2017-01-16 19:51:33 Invoice.1 21605000182
#  2 2017-01-16 19:51:33 Invoice.1 21605000182
#  3 2017-01-16 19:51:33 Invoice.1 21605000182
#  4 2017-01-16 19:51:33 Invoice.1 21605000182
#  5 2017-01-20 13:28:36 Invoice.1 21510000669
#  6 2017-01-20 13:28:36 Invoice.1 21609000856
#  7 2017-01-20 13:28:36 Invoice.1 21606000405
#  8 2017-01-20 13:28:36 Invoice.1 21610000133
#  9 2017-01-20 13:28:36 Invoice.1 21604000592
# 10 2017-01-20 13:28:36 Invoice.1 21609001012
# 11 2017-01-16 19:51:33 Invoice.2 21605000183
# 12 2017-01-16 19:51:33 Invoice.2 21605000183
# 13 2017-01-16 19:51:33 Invoice.2 21605000183
# 14 2017-01-16 19:51:33 Invoice.2 21605000183
# 15 2017-01-20 13:28:36 Invoice.2 21602000125
# 16 2017-01-20 13:28:36 Invoice.2 21608000354
# 17 2017-01-20 13:28:36 Invoice.2 21605000604
# 18 2017-01-20 13:28:36 Invoice.3 21608000366
# 19 2017-01-20 13:28:36 Invoice.3 21608000356
# 20 2017-01-20 13:28:36 Invoice.3 21605000608

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM