简体   繁体   English

R:跨行的频率

[英]R: frequencies across rows

Consider this data frame:考虑这个数据框:

library(dplyr)

one <- c("no", "no", "no", "no", "yes", "yes", "yes", "yes")
two <- c("apple", "banana", "orange", "carrot", "apple", "banana", "orange", "carrot")
three <- c(4, 5, 6, 7, 3, 4, 5, 6)

df <- data.frame(one, two, three)
df

one    two three
1  no  apple     4
2  no banana     5
3  no orange     6
4  no carrot     7
5 yes  apple     3
6 yes banana     4
7 yes orange     5
8 yes carrot     6

Then I pivot wider然后我pivot更宽

df2 <- df %>%
  pivot_wider(names_from = one, values_from = three) 

 two    no    yes  
  <chr>  <chr> <chr>
1 apple  4     3    
2 banana 5     4    
3 orange 6     5    
4 carrot 7     6    

Now, I want the relative frequencies across rows, but I cannot figure out to get there.现在,我想要跨行的相对频率,但我无法弄清楚如何到达那里。 There are the desired columns:有所需的列:

desired_column_no <- c(4/7,5/9,6/11,7/13)
desired_column_yes <- c(3/7,4/9,5/11,6/13)

df2 %>%
  cbind(desired_column_no,
        desired_column_yes)

    two no yes desired_column_no desired_column_yes
1  apple  4   3         0.5714286          0.4285714
2 banana  5   4         0.5555556          0.4444444
3 orange  6   5         0.5454545          0.4545455
4 carrot  7   6         0.5384615          0.4615385

I've been playing around with group_by() , summarize() and across() , but haven't gotten it to work correctly.我一直在玩group_by()summarize()和 cross( across() ,但还没有让它正常工作。 Any help is greatly appreciated!任何帮助是极大的赞赏!

With proportions , before pivot_wider :使用pivot_wider proportions

library(dplyr)
library(tidyr)
df %>% 
  group_by(two) %>% 
  mutate(prop = proportions(three)) %>% 
  pivot_wider(names_from = one, values_from = c(three, prop)) 
  two    three_no three_yes prop_no prop_yes
  <chr>     <dbl>     <dbl>   <dbl>    <dbl>
1 apple         4         3   0.571    0.429
2 banana        5         4   0.556    0.444
3 orange        6         5   0.545    0.455
4 carrot        7         6   0.538    0.462
  1. Don't use data.frame(cbind(.)) , you're corrupting your data by converting numbers to strings.不要使用data.frame(cbind(.)) ,您会通过将数字转换为字符串来破坏数据。 While it's reversible (and in general, "mostly" reversible but not always), it's also perfectly avoidable.虽然它是可逆的(通常“大部分”是可逆的,但并非总是如此),但它也是完全可以避免的。 Just use data.frame(.) .只需使用data.frame(.)

  2. We can use across on your wider format.我们可以across您更广泛的格式上使用。

df <- data.frame(one,two,three) %>%
  pivot_wider(names_from = one, values_from = three) 
df %>%
  mutate(
    across(c(no, yes), ~ . / (no + yes),
           .names = "desired_column_{.col}")
  )
# # A tibble: 4 x 5
#   two       no   yes desired_column_no desired_column_yes
#   <chr>  <dbl> <dbl>             <dbl>              <dbl>
# 1 apple      4     3             0.571              0.429
# 2 banana     5     4             0.556              0.444
# 3 orange     6     5             0.545              0.455
# 4 carrot     7     6             0.538              0.462

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM