[英]R: Create matrix of values from other table
I have the following data frame, table5
, made up of x
and its frequency, produced from other data using counts
: 我有以下由
x
及其频率组成的数据表table5
,它是使用counts
从其他数据中产生的:
x freq
1 1 3
2 3 21
3 4 21
4 5 1345
5 7 1
which I would like to transfer - in a general fashion, ie for use with other values in original data frame - into the following data frame table5if
: 我想将其(以一般方式,即与原始数据帧中的其他值一起使用)传输到以下数据帧
table5if
:
Frequency
3 21
4 21
5 1345
other 4
ie where the frequency of the numbers 3, 4 and 5 is transferred directly, and all other numbers are added together in other
. 也就是说,数字3、4和5的频率是直接传递的,而所有其他数字在
other
中加在一起。 My latest attempt is this: 我最近的尝试是这样的:
k <- seq(1, nrow(table5), by=1)
ifelse(table5$x[k] == 3, table5if[1] <- table5$freq[k],
ifelse(table5$x[k] == 4, table5if[2] <- table5$freq[k],
ifelse(table5$x[k] == 5, table5if[3] <- table5$freq[k], table5if[4] <- (table5if[4] + table5$freq[k])
)
)
)
This attempt, and other attempts using if(...){...} else {...} etc.
, have all yielded some form of warning or error (eg "number of items to replace..." and "number of dimensions..." and haven't produced any convincing results. I've looked through countless other questions for both errors/warnings and can't quite find what I'm looking for - there's a lot about vectorisation but I can't quite get my head around why that would be the issue. Can anyone please suggest a suitable option for this small task? 此尝试以及其他使用
if(...){...} else {...} etc.
尝试,都产生了某种形式的警告或错误(例如,“要替换的项目数...”和“尺寸...”,并且没有产生任何令人信服的结果。我已经遍历了无数其他有关错误/警告的问题,无法完全找到我想要的东西-关于矢量化有很多东西,但是我可以我不太明白为什么会是这个问题。有人可以为这个小任务建议一个合适的选择吗?
I would aggregate by factor(x, levels = 3:5)
while all the non-present levels will become NA
. 我将按
factor(x, levels = 3:5)
进行汇总,而所有不存在的水平都将变为NA
。 You can then change this afterwards to "other"
if you like. 然后,您可以根据需要将其更改为
"other"
。 data.table
is convenient in this case as it keeps the NA
sa separate group instead of omitting them 在这种情况下,
data.table
很方便,因为它可以将NA
sa保留为单独的组,而不是忽略它们
library(data.table)
setDT(df)[, .(Frequency = sum(freq)), by = factor(x, levels = 3:5)]
# factor Frequency
# 1: NA 4
# 2: 3 21
# 3: 4 21
# 4: 5 1345
A base R
option would be to create a logical index based on the values of 'x' column with %in%
. 一个
base R
选项是基于带有%in%
的'x'列的值创建一个逻辑索引。 We get the sum
of 'freq' based on the negated index of 'i1' and rbind
with the subset rows of 'table5'. 我们拿到的
sum
基于“I1”和的否定指数“频率”的rbind
与“表5”的子行。
i1 <- table5$x %in% 3:5
`row.names<-`(rbind(table5[i1,], list(x= "Other",
freq=sum(table5[!i1,"freq"]))), NULL)
# x freq
#1 3 21
#2 4 21
#3 5 1345
#4 Other 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.