简体   繁体   English

R:从其他表创建值矩阵

[英]R: Create matrix of values from other table

I have the following data frame, table5 , made up of x and its frequency, produced from other data using counts : 我有以下由x及其频率组成的数据表table5 ,它是使用counts从其他数据中产生的:

  x freq
1 1    3
2 3   21
3 4   21
4 5 1345
5 7    1

which I would like to transfer - in a general fashion, ie for use with other values in original data frame - into the following data frame table5if : 我想将其(以一般方式,即与原始数据帧中的其他值一起使用)传输到以下数据帧table5if

      Frequency
3             21
4             21
5             1345
other         4

ie where the frequency of the numbers 3, 4 and 5 is transferred directly, and all other numbers are added together in other . 也就是说,数字3、4和5的频率是直接传递的,而所有其他数字在other中加在一起。 My latest attempt is this: 我最近的尝试是这样的:

k <- seq(1, nrow(table5), by=1)
    ifelse(table5$x[k] == 3, table5if[1] <- table5$freq[k],
          ifelse(table5$x[k] == 4, table5if[2] <- table5$freq[k],
                ifelse(table5$x[k] == 5, table5if[3] <- table5$freq[k], table5if[4] <- (table5if[4] + table5$freq[k])
                  )
            )
      )

This attempt, and other attempts using if(...){...} else {...} etc. , have all yielded some form of warning or error (eg "number of items to replace..." and "number of dimensions..." and haven't produced any convincing results. I've looked through countless other questions for both errors/warnings and can't quite find what I'm looking for - there's a lot about vectorisation but I can't quite get my head around why that would be the issue. Can anyone please suggest a suitable option for this small task? 此尝试以及其他使用if(...){...} else {...} etc.尝试,都产生了某种形式的警告或错误(例如,“要替换的项目数...”和“尺寸...”,并且没有产生任何令人信服的结果。我已经遍历了无数其他有关错误/警告的问题,无法完全找到我想要的东西-关于矢量化有很多东西,但是我可以我不太明白为什么会是这个问题。有人可以为这个小任务建议一个合适的选择吗?

I would aggregate by factor(x, levels = 3:5) while all the non-present levels will become NA . 我将按factor(x, levels = 3:5)进行汇总,而所有不存在的水平都将变为NA You can then change this afterwards to "other" if you like. 然后,您可以根据需要将其更改为"other" data.table is convenient in this case as it keeps the NA sa separate group instead of omitting them 在这种情况下, data.table很方便,因为它可以将NA sa保留为单独的组,而不是忽略它们

library(data.table)
setDT(df)[, .(Frequency = sum(freq)), by = factor(x, levels = 3:5)]
#    factor Frequency
# 1:     NA         4
# 2:      3        21
# 3:      4        21
# 4:      5      1345

A base R option would be to create a logical index based on the values of 'x' column with %in% . 一个base R选项是基于带有%in%的'x'列的值创建一个逻辑索引。 We get the sum of 'freq' based on the negated index of 'i1' and rbind with the subset rows of 'table5'. 我们拿到的sum基于“I1”和的否定指数“频率”的rbind与“表5”的子行。

i1 <- table5$x %in% 3:5
`row.names<-`(rbind(table5[i1,], list(x= "Other", 
          freq=sum(table5[!i1,"freq"]))), NULL)
#      x freq
#1     3   21
#2     4   21
#3     5 1345
#4 Other    4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM