[英]Copy a subset of a column, based on conditions, to another dataframe in R
I have very limited R skills, and after hours searching for a solution I could not see an option that would work.我的 R 技能非常有限,在寻找解决方案数小时后,我看不到可行的选项。 I have several large data tables.
我有几个大型数据表。 From each one, I would like to copy part of a column into an dataframe, to populate a column there.
从每一个中,我想将一列的一部分复制到 dataframe 中,以在那里填充一列。 My data tables (tabn1, tabn2, tabn3) all have the same format, but with different lengths.
我的数据表(tabn1、tabn2、tabn3)都具有相同的格式,但长度不同。 Each subset will have a different number of rows.
每个子集将具有不同数量的行。 I would want empty spaces to be filled with NA.
我希望空白处用 NA 填充。 I can't even copy the first column, so the subsequent are the next problem!
我什至不能复制第一列,所以接下来是下一个问题!
Ro Co Red Green Yellow
1 3 123 999 265
1 3 223 875 5877
1 4 21488 555 478
1 4 558 23698 5558
2 3 558 559 148
2 3 4579 557 59
2 4 1489 545 2369
2 4 123 999 265
3 3 558 559 148
3 3 558 23698 5558
3 4 4579 557 59
3 4 1478 4579 557
4 3 1488 555 478
4 3 1478 2945 5889
4 4 448 259 4548
4 4 26576 158 15
My new data frame col names:我的新数据框列名称:
cls <- c("n1","n2","n3")
I created a dataframe with the column names:我用列名创建了一个 dataframe:
df <- setNames(data.frame(matrix(ncol=3)),cls)
For each of my tables, I want to subset Ro > = 3, Co = 3, column "Red" only I have tried:对于我的每个表,我只想对 Ro > = 3、Co = 3、列“Red”进行子集化,但我尝试过:
sub1 <- (filter(tabn1, tabn1$Ro >=3 | tabn$Co == 3)
df$n1 <- sub1$Red
> Error in `$<-.data.frame`(`*tmp*`, n1, value = c(183.94, 180.884, :
replacement has 32292 rows, data has 1
Also:还:
df$n1 <- cut(sub1$Red)
> Error in cut.default(sub1$Red) :
argument "breaks" is missing, with no default
I tried using df as a datatable instead of dataframe, but also got the following errors:我尝试使用 df 作为数据表而不是 dataframe,但也出现以下错误:
df <- setNames(data.table(matrix(ncol=3)),cls)
df$n1 <- sub1$Red
> Error in set(x, j = name, value = value) :
Supplied 32292 items to be assigned to 1 items of column 'nn1'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
I would subsequently tried to subset and copy from tabn2 to df$n2, and so forth.我随后会尝试将子集从 tabn2 复制到 df$n2,依此类推。 As indicated above, the original tables have different lengths.
如上所述,原始表格具有不同的长度。 Thanks in advance!
提前致谢!
The issue is that the number of rows in 'df' and 'sub1' are different.问题是“df”和“sub1”中的行数不同。 'df' is created with 1 row.
'df' 是用 1 行创建的。 Instead, we can create the 'df' directly from the 'sub1' itself
相反,我们可以直接从“sub1”本身创建“df”
df <- sub1['Red']
names(df) <- cls[1]
Also, another way to create the data.frame, would be to specify the nrow
as well此外,创建 data.frame 的另一种方法是也指定
nrow
df <- as.data.frame(matrix(nrow = nrow(sub1), ncol = length(cls)),
dimnames = list(NULL, cls))
Regarding the second error with cut
, it needs breaks
.关于
cut
的第二个错误,它需要breaks
。 Either we specify the number of breaks我们要么指定休息次数
cut(sub1$Red, breaks = 3)
Or a vector of break points或断点向量
cut(sub1$Red, breaks = c(-Inf, 100, 500, 1000, Inf))
If there are many 'tabn' objects, get them into a list
, loop over the list
with lapply
如果有很多 'tabn' 对象,将它们放入
list
,使用lapply
list
lst1 <- mget(ls(pattern = '^tabn\\d+$'))
out_lst <- lapply(lst1, function(x) subset(x, Ro >=3 | Co == 3)$Red)
It is possible that after subset
ting and selecting the 'Red' column, the number of elements may be different.在
subset
化和选择“红色”列之后,元素的数量可能会有所不同。 If the lengths
are different, a option is to pad NA
at the end for those having lesser number of elements before cbind
ing it如果
lengths
不同,一个选项是在最后填充NA
以在cbind
之前为那些元素数量较少的元素填充它
mx <- max(lengths(out_lst))
df <- do.call(cbind, lapply(out_lst, `length<-`, mx))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.