简体   繁体   中英

Data table wrangling

I have some messy data representing the feedback from the PO creation process

PO <- c(1, 1, 2, 2, 3, 4, 5, 6)
Rating <- c(3, 0, 0, 1, 3, 4, 5, 4)
dt <- data.table(PO, Rating)

> dt
   PO Rating
1:  1      3
2:  1      0
3:  2      0
4:  2      1
5:  3      3
6:  4      4
7:  5      5
8:  6      4

PO #1 has two ratings of 3 and 0, and PO #2 has rating of 0 and 1. In all such cases, I want to change the rows to the max for that PO

   PO Rating
1:  1      3
2:  1      3 <- changed from 0
3:  2      1 <- changed from 0
4:  2      1
5:  3      3
6:  4      4
7:  5      5
8:  6      4

First step is to detect the POs having this issue. I have the following R code for this:

t <- dt[, .(U=length(unique(Rating))), by=.(PO)]

> t
   PO U
1:  1 2
2:  2 2
3:  3 1
4:  4 1
5:  5 1
6:  6 1

This shows that PO #1 and #2 have two unique ratings. Now, my task is to find the max of these unique ratings and assign them back into the data table dt.

How do I do this in R?

Using data.table functions:

# subset by PO, then find the max Rating in each group, and reassign
# that max value to the Rating
dt[ , Rating := max(Rating, na.rm = TRUE), by = PO]

Cheers!

We can also order and then assign the first element

dt[order(PO, -Rating), Rating := Rating[1], PO]
dt
#   PO Rating
#1:  1      3
#2:  1      3
#3:  2      1
#4:  2      1
#5:  3      3
#6:  4      4
#7:  5      5
#8:  6      4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM