[英]Find minimum value greater than 0
I have a data frame that contains numerical values 1:4 with some NA's. 我有一个数据框,其中包含带有一些NA的数值1:4。 For each row, I would like to calculate the frequency (as a percentage) of the value with the fewest occurrences greater than 0. 对于每一行,我想计算出现次数最少的大于0的值的频率(百分比)。
Here is a sample data frame to work with. 这是一个示例数据框架。
df = as.data.frame(rbind(c(1,2,1,2,2,2,2,1,NA,2),c(2,3,3,2,3,3,NA,2,NA,NA),c(4,1,NA,NA,NA,1,1,1,4,4),c(3,3,3,4,4,4,NA,4,3,4)))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1 2 1 2 2 2 2 1 NA 2
2 2 3 3 2 3 3 NA 2 NA NA
3 4 1 NA NA NA 1 1 1 4 4
4 3 3 3 4 4 4 NA 4 3 4
I have 2 points that I am struggling with. 我有2分正在苦苦挣扎。 1) finding the lowest frequency of a value greater than 0 and 2)applying the function to each row of my data frame. 1)找到一个大于0的值的最低频率,以及2)将函数应用于数据帧的每一行。 When I started working on this function I implemented it using the code below, but it did not appear to be applied to every row. 当我开始使用此功能时,我使用下面的代码实现了该功能,但似乎并未将其应用于所有行。 My result for value.1, value.2, etc was the same for every row. 我的value.1,value.2等结果对于每一行都是相同的。
Low_Freq = function(x){
value.1 = sum(x==1, na.rm=TRUE) #count the number of 1's per row
value.2 = sum(x==2, na.rm=TRUE) #count the number of 2's per row
value.3 = sum(x==3, na.rm=TRUE) #count the number of 3's per row
value.4 = sum(x==4, na.rm=TRUE) #count the number of 4's per row
num.values = rowSums(!is.na(x), na.rm=TRUE) #count total number of non-NA values in each row
#what is the minimum frequency value greater than 0 among value.1, value.2, value.3, and value.4 for EACH row?
min.value.freq = min(cbind(value.1,value.2,value.3,value.4))
out = min.value.freq/num.values #calculate the percentage of the minimum value for each row
}
df$Low_Freq = apply(df, 1, function(x))
Then I started using rowSums() to compute value.1, value.2, value.3, and value.4. 然后,我开始使用rowSums()计算value.1,value.2,value.3和value.4。 This fixed my problem of counting value.1, value.2, etc for each row, however, I then had to apply the function without the use of apply() for it to run: 这解决了我为每一行计数value.1,value.2等的问题,但是,我随后不得不应用该函数而不使用apply()来运行它:
Low_Freq = function(x){
value.1 = rowSums(x==1, na.rm=TRUE) #count the number of 1's per row
value.2 = rowSums(x==2, na.rm=TRUE) #count the number of 2's per row
value.3 = rowSums(x==3, na.rm=TRUE) #count the number of 3's per row
value.4 = rowSums(x==4, na.rm=TRUE) #count the number of 4's per row
num.values = rowSums(!is.na(x), na.rm=TRUE) #count total number of non-NA values in each row
#what is the minimum frequency value greater than 0 among value.1, value.2, value.3, and value.4 for EACH row?
min.value.freq = min(cbind(value.1,value.2,value.3,value.4))
out = min.value.freq/num.values #calculate the percentage of the minimum value for each row
}
df$Low_Freq = Low_Freq(df)
So the act of applying to each row then seemed to occur within the function itself. 因此,应用于每一行的动作似乎发生在函数本身内。 That's all fine and dandy, but when I go to make my final calculation which will be my output, I cannot figure out how to identify which of values 1, 2, 3, or 4 has the lowest frequency for each row. 一切都很好,但是当我去做最后的计算,这将是我的输出时,我无法弄清楚如何确定值1、2、3或4中哪一行的频率最低。 This value must be divided by the number of non-NA values for each row. 该值必须除以每行非NA值的数量。
My desired result should look like this: 我期望的结果应如下所示:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 Low_Freq
1 1 2 1 2 2 2 2 1 NA 2 0.3333333
2 2 3 3 2 3 3 NA 2 NA NA 0.4285714
3 4 1 NA NA NA 1 1 1 4 4 0.4285714
4 3 3 3 4 4 4 NA 4 3 4 0.4444444
I feel like I am going in circles with this seemingly simple function. 我觉得我似乎正在用这个看似简单的功能盘旋。 Any help would be appreciated. 任何帮助,将不胜感激。
Thank you. 谢谢。
The table
function will return the frequency of each value that appears, ignoring NA
values. table
函数将返回出现的每个值的频率,而忽略NA
值。 Therefore, the min
of the table
result is the minimum frequency of a value that shows up in your row, and the sum is the number of non- NA
values in your row. 因此, min
的的table
结果是,你行中示出了一个值的最小频率,并且总和是非的数目NA
您的行中的值。
Low_Freq = function(x){
tab = table(x)
return(min(tab) / sum(tab))
}
df$Low_Freq = apply(df, 1, Low_Freq)
df
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 Low_Freq
# 1 1 2 1 2 2 2 2 1 NA 2 0.3333333
# 2 2 3 3 2 3 3 NA 2 NA NA 0.4285714
# 3 4 1 NA NA NA 1 1 1 4 4 0.4285714
# 4 3 3 3 4 4 4 NA 4 3 4 0.4444444
If you wanted to not use 5s for the numerator but to use them for the denominator, you could do: 如果您不希望分子使用5s,而是使用分母,则可以执行以下操作:
df = as.data.frame(rbind(c(1,2,1,2,2,2,2,1,NA,2),c(2,3,3,2,3,3,NA,2,NA,NA),c(4,1,NA,NA,NA,1,1,1,4,4),c(3,3,3,4,4,4,5,4,3,4)))
Low_Freq = function(x){
tab = table(x[x != 5])
return(min(tab) / sum(!is.na(x)))
}
df$Low_Freq = apply(df, 1, Low_Freq)
df
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 Low_Freq
# 1 1 2 1 2 2 2 2 1 NA 2 0.3333333
# 2 2 3 3 2 3 3 NA 2 NA NA 0.4285714
# 3 4 1 NA NA NA 1 1 1 4 4 0.4285714
# 4 3 3 3 4 4 4 5 4 3 4 0.4000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.