[英]masking specific cell values in rows that meet condition
我為一些調查問題生成一些頻率,然后將其中一些問題放到一個數據框中。 每個問題的回答為“是/否”,也報告為“ No %
和“ Yes %
。
現在,如果在給定的行No < 15
或Yes < 15
則只有Total值在該行中可見,而No,Yes, No %
和Yes %
列被掩碼為NA
。
我搞砸了case_when
和其他選項,但是運氣不好。 我會插手,但如果有一個明顯的解決方案能使某人滿意,我將不勝感激。 我不願意向dplyr
尋求解決方案。 提前致謝!
示例數據框顯示為mytab
:
mytab <- structure(list(No = c(271L, 1395L, 1393L, 1338L, 1254L, 1355L, 1332L, 1380L, 1360L), Yes = c(1138L, 14L, 16L, 71L, 155L, 54L, 77L, 29L, 49L),
Total = c(1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409),
`No (%)` = c(19.2334989354152, 99.0063875088715, 98.8644428672818, 94.9609652235628, 88.9992902767921, 96.1674946770759, 94.5351312987935, 97.9418026969482, 96.5223562810504),
`Yes (%)` = c(80.7665010645848, 0.99361249112846, 1.13555713271824, 5.03903477643719, 11.0007097232079, 3.83250532292406, 5.46486870120653, 2.05819730305181, 3.47764371894961)),
row.names = c(NA, -9L),
class = "data.frame")
mytab
#> No Yes Total No (%) Yes (%)
#> 1 271 1138 1409 19.23350 80.7665011
#> 2 1395 14 1409 99.00639 0.9936125
#> 3 1393 16 1409 98.86444 1.1355571
#> 4 1338 71 1409 94.96097 5.0390348
#> 5 1254 155 1409 88.99929 11.0007097
#> 6 1355 54 1409 96.16749 3.8325053
#> 7 1332 77 1409 94.53513 5.4648687
#> 8 1380 29 1409 97.94180 2.0581973
#> 9 1360 49 1409 96.52236 3.4776437
該解決方案應該產生mytab2
,然后可以將其通過管道傳輸到knitr
。
mytab2 <- structure(list(No = c(271L, NA, 1393L, 1338L, 1254L, 1355L, 1332L, 1380L, 1360L),
Yes = c(1138L, NA, 16L, 71L, 155L, 54L, 77L, 29L, 49L),
Total = c(1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409),
`No (%)` = c(19.2334989354152, NA, 98.8644428672818, 94.9609652235628, 88.9992902767921, 96.1674946770759, 94.5351312987935, 97.9418026969482, 96.5223562810504),
`Yes (%)` = c(80.7665010645848, NA, 1.13555713271824, 5.03903477643719, 11.0007097232079, 3.83250532292406, 5.46486870120653, 2.05819730305181, 3.47764371894961)),
row.names = c(NA, -9L),
class = "data.frame")
mytab2
#> No Yes Total No (%) Yes (%)
#> 1 271 1138 1409 19.23350 80.766501
#> 2 NA NA 1409 NA NA
#> 3 1393 16 1409 98.86444 1.135557
#> 4 1338 71 1409 94.96097 5.039035
#> 5 1254 155 1409 88.99929 11.000710
#> 6 1355 54 1409 96.16749 3.832505
#> 7 1332 77 1409 94.53513 5.464869
#> 8 1380 29 1409 97.94180 2.058197
#> 9 1360 49 1409 96.52236 3.477644
這與divibisan的答案相同,但使用data.table語法可減少表名的重復和between
使用(因為它似乎合適):
library(data.table)
mybadtab = data.table(mytab)
mymin = 15
badcols = c("No", "Yes", "No (%)", "Yes (%)")
mybadtab[!( No %between% c(mymin, Total - mymin) ), (badcols) := NA]
No Yes Total No (%) Yes (%)
1: 271 1138 1409 19.23350 80.766501
2: NA NA 1409 NA NA
3: 1393 16 1409 98.86444 1.135557
4: 1338 71 1409 94.96097 5.039035
5: 1254 155 1409 88.99929 11.000710
6: 1355 54 1409 96.16749 3.832505
7: 1332 77 1409 94.53513 5.464869
8: 1380 29 1409 97.94180 2.058197
9: 1360 49 1409 96.52236 3.477644
管道形式...
library(magrittr)
library(knitr)
mymin = 15
badcols = c("No", "Yes", "No (%)", "Yes (%)")
data.table(mytab)[!( No %between% c(mymin, Total - mymin) ), (badcols) := NA] %>%
kable
| No| Yes| Total| No (%)| Yes (%)|
|----:|----:|-----:|--------:|---------:|
| 271| 1138| 1409| 19.23350| 80.766501|
| NA| NA| 1409| NA| NA|
| 1393| 16| 1409| 98.86444| 1.135557|
| 1338| 71| 1409| 94.96097| 5.039035|
| 1254| 155| 1409| 88.99929| 11.000710|
| 1355| 54| 1409| 96.16749| 3.832505|
| 1332| 77| 1409| 94.53513| 5.464869|
| 1380| 29| 1409| 97.94180| 2.058197|
| 1360| 49| 1409| 96.52236| 3.477644|
在基數R中,您可以僅使用帶有方括號的子集來獲取適當的行,然后將NA
分配給要更改的列。 注意:這將修改 mytab
的值。 如果要在新的data.frame中進行更改,則需要復制mytab
並修改副本:
mytab2 <- mytab
mytab2[mytab2$No < 15 | mytab2$Yes < 15, c('No', 'Yes', 'No (%)', 'Yes (%)')] <- NA
mytab2
No Yes Total No (%) Yes (%)
1 271 1138 1409 19.23350 80.766501
2 NA NA 1409 NA NA
3 1393 16 1409 98.86444 1.135557
4 1338 71 1409 94.96097 5.039035
5 1254 155 1409 88.99929 11.000710
6 1355 54 1409 96.16749 3.832505
7 1332 77 1409 94.53513 5.464869
8 1380 29 1409 97.94180 2.058197
9 1360 49 1409 96.52236 3.477644
嘗試這個:
df<-as.data.frame(list(No = c(271, 1395, 1393, 1338, 1254, 1355, 1332, 1380, 1360),
Yes = c(1138, 14, 16, 71, 155, 54, 77, 29, 49),
Total = c(1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409)))
df$NoPct<-0
df$YesPct<-0
rowcalc<-function(x){
if (x[1]<15 | x[2]<15){
x[1]= x[2]= x[4]=x[5]=NA
} else {
x[4]<- round(100*x[1]/x[3],digits=2) #rounding to 2 decimal places
x[5]<- round(100*x[2]/x[3],digits=2)
}
return(x)
}
t(apply(df,1,rowcalc)) #apply rowcalc to every row & transpose it
# No Yes Total NoPct YesPct
#[1,] 271 1138 1409 19.23 80.77
#[2,] NA NA 1409 NA NA
#[3,] 1393 16 1409 98.86 1.14
#[4,] 1338 71 1409 94.96 5.04
#[5,] 1254 155 1409 89.00 11.00
#[6,] 1355 54 1409 96.17 3.83
#[7,] 1332 77 1409 94.54 5.46
#[8,] 1380 29 1409 97.94 2.06
#[9,] 1360 49 1409 96.52 3.48
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.