简体   繁体   English

需要 R function 以按 Y 计数复制 X 数据,其中 X 包含一些重复值

[英]Need an R function to replicate X data by Y counts, where X contains some repeated values

I have a fairly large data set (18,000) rows with 2 columns off interest.我有一个相当大的数据集(18,000)行,其中有 2 列不感兴趣。 I would like to treat one (X) as the quantitative values, and the other (Y) as counts, and repeat the X data based on the counts.我想将一个(X)作为定量值,另一个(Y)作为计数,并根据计数重复 X 数据。 Due to the nature off the data, there are repeat values in the X column, and I just want to create a new data set containing all X values and its repeated measurements.由于数据的性质,X 列中有重复值,我只想创建一个包含所有 X 值及其重复测量值的新数据集。 I have tried doing the following, but it returns an invalid times argument: rep, df$X, df$Y我尝试执行以下操作,但它返回一个无效的时间参数: rep, df$X, df$Y

I am not sure why this error is occurring, and don't know where to go from here.我不确定为什么会发生此错误,也不知道从这里到 go 的位置。 Any help is appreciated.任何帮助表示赞赏。 Below is a small sample of my data.下面是我的数据的一个小样本。

8.76    3
24.69   0
6.24    2
1.17    0
6.54    3
10.29   0
11.04   1
16.71   1

I can reproduce that error when one or more Y is NA (or negative):当一个或多个YNA (或负数)时,我可以重现该错误:

df
#      V1 V2
# 1  8.76  3
# 2 24.69 NA
# 3  6.24  2
# 4  1.17  0
# 5  6.54  3
# 6 10.29  0
# 7 11.04  1
# 8 16.71  1
rep(df$V1, df$V2)
# Error in rep(df$V1, df$V2) : invalid 'times' argument
df$V2[2] <-  -1
rep(df$V1, df$V2)
# Error in rep(df$V1, df$V2) : invalid 'times' argument

We can replace the NA with 0 :我们可以将NA替换为0

rep(df$V1, pmax(0, df$V2, na.rm = TRUE))
#  [1]  8.76  8.76  8.76  6.24  6.24  6.54  6.54  6.54 11.04 16.71

Data数据

df <- structure(list(V1 = c(8.76, 24.69, 6.24, 1.17, 6.54, 10.29, 11.04, 16.71), V2 = c(3L, NA, 2L, 0L, 3L, 0L, 1L, 1L)), row.names = c(NA, -8L), class = "data.frame")

Maybe you are looking for uncount ?也许您正在寻找uncount

library(tidyr)
library(dplyr)

df %>% 
  uncount(count)

This returns这返回

# A tibble: 10 x 1
   value
   <dbl>
 1  8.76
 2  8.76
 3  8.76
 4  6.24
 5  6.24
 6  6.54
 7  6.54
 8  6.54
 9 11.0 
10 16.7 

A base R alternative:一个基本的 R 替代方案:

transform(df[rep(seq_len(nrow(df)), df$y),], y = sequence(df$y))

output: output:

        x y
1    8.76 1
1.1  8.76 2
1.2  8.76 3
3    6.24 1
3.1  6.24 2
5    6.54 1
5.1  6.54 2
5.2  6.54 3
7   11.04 1
8   16.71 1

data:数据:

df <- structure(list(x = c(8.76, 24.69, 6.24, 1.17, 6.54, 10.29, 11.04, 
16.71), y = c(3L, 0L, 2L, 0L, 3L, 0L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-8L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检索数据框中两列中最重复的(x,y)值 - Retrieve the most repeated (x, y) values in two columns in a data frame y是R中x的函数 - y as a function of x in R 将具有x,y和z的列表转换为数据帧,其中x,y和z在R中的长度不相等 - Convert a list with x, y and z to data frame where x, y and z are of unequal lengths in R 如何处理 R 中 lm(x~y) 函数中的负值? - How to treat negative values in lm(x~y) function in R? function 从 r 中的 X 和 y 生成值 beta 的向量 - function that generates a vector of values beta from X and y in r 保留R图中x和y数据值之间的关系 - Retain relationship between x and y data values in R plots 在R中的图中翻转X和Y值(数据也改变轴) - Flipping the X and Y values in a plot in R (with the data changing axes as well) 生成二元数据,其中 x 变量均匀分布在 0 和 1 之间,Y 正态分布,均值为 1/x,带有一些噪声 - Generating bivariate data where x variable is uniformly distributed between 0 and 1 and Y is normally distributed with mean 1/x with some noise R:在函数(x,y)中找不到对象y [通过r中的数据帧传递的函数] - R: object y not found in function (x,y) [function to pass through data frames in r] 在R中:使用文本功能将y标签移到y轴的左侧,其中x标签为日期类型? - In R: move y label to the left of y axis using text function where x label is date type?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM