I am generating some frequencies for some survey questions and then putting some of these questions together into a data-frame. Each question has a Yes/No response, that is also reported as No %
and Yes %
.
Now, if in a given row No < 15
OR Yes < 15
then only the Total value should be visible in that row while the No, Yes, No %
and Yes %
columns are masked as NA
.
I am messing about with case_when
and other options but having little luck. I'll plug away but if an obvious solution strikes someone, I'd be grateful. I am not wedded to dplyr
for the solution. thanks in advance!
The example data frame is shown below as mytab
:
mytab <- structure(list(No = c(271L, 1395L, 1393L, 1338L, 1254L, 1355L, 1332L, 1380L, 1360L), Yes = c(1138L, 14L, 16L, 71L, 155L, 54L, 77L, 29L, 49L),
Total = c(1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409),
`No (%)` = c(19.2334989354152, 99.0063875088715, 98.8644428672818, 94.9609652235628, 88.9992902767921, 96.1674946770759, 94.5351312987935, 97.9418026969482, 96.5223562810504),
`Yes (%)` = c(80.7665010645848, 0.99361249112846, 1.13555713271824, 5.03903477643719, 11.0007097232079, 3.83250532292406, 5.46486870120653, 2.05819730305181, 3.47764371894961)),
row.names = c(NA, -9L),
class = "data.frame")
mytab
#> No Yes Total No (%) Yes (%)
#> 1 271 1138 1409 19.23350 80.7665011
#> 2 1395 14 1409 99.00639 0.9936125
#> 3 1393 16 1409 98.86444 1.1355571
#> 4 1338 71 1409 94.96097 5.0390348
#> 5 1254 155 1409 88.99929 11.0007097
#> 6 1355 54 1409 96.16749 3.8325053
#> 7 1332 77 1409 94.53513 5.4648687
#> 8 1380 29 1409 97.94180 2.0581973
#> 9 1360 49 1409 96.52236 3.4776437
The solution should yield mytab2
that can then be piped to knitr
.
mytab2 <- structure(list(No = c(271L, NA, 1393L, 1338L, 1254L, 1355L, 1332L, 1380L, 1360L),
Yes = c(1138L, NA, 16L, 71L, 155L, 54L, 77L, 29L, 49L),
Total = c(1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409),
`No (%)` = c(19.2334989354152, NA, 98.8644428672818, 94.9609652235628, 88.9992902767921, 96.1674946770759, 94.5351312987935, 97.9418026969482, 96.5223562810504),
`Yes (%)` = c(80.7665010645848, NA, 1.13555713271824, 5.03903477643719, 11.0007097232079, 3.83250532292406, 5.46486870120653, 2.05819730305181, 3.47764371894961)),
row.names = c(NA, -9L),
class = "data.frame")
mytab2
#> No Yes Total No (%) Yes (%)
#> 1 271 1138 1409 19.23350 80.766501
#> 2 NA NA 1409 NA NA
#> 3 1393 16 1409 98.86444 1.135557
#> 4 1338 71 1409 94.96097 5.039035
#> 5 1254 155 1409 88.99929 11.000710
#> 6 1355 54 1409 96.16749 3.832505
#> 7 1332 77 1409 94.53513 5.464869
#> 8 1380 29 1409 97.94180 2.058197
#> 9 1360 49 1409 96.52236 3.477644
This is the same as divibisan's answer, but with data.table syntax that reduces repetition of the table name and uses between
(since it seems to fit):
library(data.table)
mybadtab = data.table(mytab)
mymin = 15
badcols = c("No", "Yes", "No (%)", "Yes (%)")
mybadtab[!( No %between% c(mymin, Total - mymin) ), (badcols) := NA]
No Yes Total No (%) Yes (%)
1: 271 1138 1409 19.23350 80.766501
2: NA NA 1409 NA NA
3: 1393 16 1409 98.86444 1.135557
4: 1338 71 1409 94.96097 5.039035
5: 1254 155 1409 88.99929 11.000710
6: 1355 54 1409 96.16749 3.832505
7: 1332 77 1409 94.53513 5.464869
8: 1380 29 1409 97.94180 2.058197
9: 1360 49 1409 96.52236 3.477644
In pipe form...
library(magrittr)
library(knitr)
mymin = 15
badcols = c("No", "Yes", "No (%)", "Yes (%)")
data.table(mytab)[!( No %between% c(mymin, Total - mymin) ), (badcols) := NA] %>%
kable
| No| Yes| Total| No (%)| Yes (%)|
|----:|----:|-----:|--------:|---------:|
| 271| 1138| 1409| 19.23350| 80.766501|
| NA| NA| 1409| NA| NA|
| 1393| 16| 1409| 98.86444| 1.135557|
| 1338| 71| 1409| 94.96097| 5.039035|
| 1254| 155| 1409| 88.99929| 11.000710|
| 1355| 54| 1409| 96.16749| 3.832505|
| 1332| 77| 1409| 94.53513| 5.464869|
| 1380| 29| 1409| 97.94180| 2.058197|
| 1360| 49| 1409| 96.52236| 3.477644|
In base R, you can just subset with square brackets to get the appropriate rows and then assign NA
to the columns you want to change. NOTE: this will modify the value of mytab
. If you want to make the changes in a new data.frame, you need to copy mytab
and modify the copy:
mytab2 <- mytab
mytab2[mytab2$No < 15 | mytab2$Yes < 15, c('No', 'Yes', 'No (%)', 'Yes (%)')] <- NA
mytab2
No Yes Total No (%) Yes (%)
1 271 1138 1409 19.23350 80.766501
2 NA NA 1409 NA NA
3 1393 16 1409 98.86444 1.135557
4 1338 71 1409 94.96097 5.039035
5 1254 155 1409 88.99929 11.000710
6 1355 54 1409 96.16749 3.832505
7 1332 77 1409 94.53513 5.464869
8 1380 29 1409 97.94180 2.058197
9 1360 49 1409 96.52236 3.477644
Try this:
df<-as.data.frame(list(No = c(271, 1395, 1393, 1338, 1254, 1355, 1332, 1380, 1360),
Yes = c(1138, 14, 16, 71, 155, 54, 77, 29, 49),
Total = c(1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409)))
df$NoPct<-0
df$YesPct<-0
rowcalc<-function(x){
if (x[1]<15 | x[2]<15){
x[1]= x[2]= x[4]=x[5]=NA
} else {
x[4]<- round(100*x[1]/x[3],digits=2) #rounding to 2 decimal places
x[5]<- round(100*x[2]/x[3],digits=2)
}
return(x)
}
t(apply(df,1,rowcalc)) #apply rowcalc to every row & transpose it
# No Yes Total NoPct YesPct
#[1,] 271 1138 1409 19.23 80.77
#[2,] NA NA 1409 NA NA
#[3,] 1393 16 1409 98.86 1.14
#[4,] 1338 71 1409 94.96 5.04
#[5,] 1254 155 1409 89.00 11.00
#[6,] 1355 54 1409 96.17 3.83
#[7,] 1332 77 1409 94.54 5.46
#[8,] 1380 29 1409 97.94 2.06
#[9,] 1360 49 1409 96.52 3.48
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.