简体   繁体   English

根据条件将列的子集复制到 R 中的另一个 dataframe

[英]Copy a subset of a column, based on conditions, to another dataframe in R

I have very limited R skills, and after hours searching for a solution I could not see an option that would work.我的 R 技能非常有限,在寻找解决方案数小时后,我看不到可行的选项。 I have several large data tables.我有几个大型数据表。 From each one, I would like to copy part of a column into an dataframe, to populate a column there.从每一个中,我想将一列的一部分复制到 dataframe 中,以在那里填充一列。 My data tables (tabn1, tabn2, tabn3) all have the same format, but with different lengths.我的数据表(tabn1、tabn2、tabn3)都具有相同的格式,但长度不同。 Each subset will have a different number of rows.每个子集将具有不同数量的行。 I would want empty spaces to be filled with NA.我希望空白处用 NA 填充。 I can't even copy the first column, so the subsequent are the next problem!我什至不能复制第一列,所以接下来是下一个问题!

Ro  Co  Red Green   Yellow
1   3   123 999 265
1   3   223 875 5877
1   4   21488   555 478
1   4   558 23698   5558
2   3   558 559 148
2   3   4579    557 59
2   4   1489    545 2369
2   4   123 999 265
3   3   558 559 148
3   3   558 23698   5558
3   4   4579    557 59
3   4   1478 4579   557
4   3   1488    555 478
4   3   1478    2945    5889
4   4   448 259 4548
4   4   26576   158 15

My new data frame col names:我的新数据框列名称:

cls <- c("n1","n2","n3")

I created a dataframe with the column names:我用列名创建了一个 dataframe:

df <- setNames(data.frame(matrix(ncol=3)),cls)

For each of my tables, I want to subset Ro > = 3, Co = 3, column "Red" only I have tried:对于我的每个表,我只想对 Ro > = 3、Co = 3、列“Red”进行子集化,但我尝试过:

sub1 <- (filter(tabn1, tabn1$Ro >=3 | tabn$Co == 3)
df$n1 <- sub1$Red

> Error in `$<-.data.frame`(`*tmp*`, n1, value = c(183.94, 180.884,  : 
  replacement has 32292 rows, data has 1

Also:还:

df$n1 <- cut(sub1$Red)

> Error in cut.default(sub1$Red) : 
  argument "breaks" is missing, with no default

I tried using df as a datatable instead of dataframe, but also got the following errors:我尝试使用 df 作为数据表而不是 dataframe,但也出现以下错误:

df <- setNames(data.table(matrix(ncol=3)),cls)
df$n1 <- sub1$Red
> Error in set(x, j = name, value = value) : 
  Supplied 32292 items to be assigned to 1 items of column 'nn1'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

I would subsequently tried to subset and copy from tabn2 to df$n2, and so forth.我随后会尝试将子集从 tabn2 复制到 df$n2,依此类推。 As indicated above, the original tables have different lengths.如上所述,原始表格具有不同的长度。 Thanks in advance!提前致谢!

The issue is that the number of rows in 'df' and 'sub1' are different.问题是“df”和“sub1”中的行数不同。 'df' is created with 1 row. 'df' 是用 1 行创建的。 Instead, we can create the 'df' directly from the 'sub1' itself相反,我们可以直接从“sub1”本身创建“df”

df <- sub1['Red']
names(df) <- cls[1]

Also, another way to create the data.frame, would be to specify the nrow as well此外,创建 data.frame 的另一种方法是也指定nrow

df <- as.data.frame(matrix(nrow = nrow(sub1), ncol = length(cls)),
       dimnames = list(NULL, cls))

Regarding the second error with cut , it needs breaks .关于cut的第二个错误,它需要breaks Either we specify the number of breaks我们要么指定休息次数

cut(sub1$Red, breaks = 3)

Or a vector of break points或断点向量

cut(sub1$Red, breaks = c(-Inf, 100, 500, 1000, Inf))

If there are many 'tabn' objects, get them into a list , loop over the list with lapply如果有很多 'tabn' 对象,将它们放入list ,使用lapply list

lst1 <- mget(ls(pattern = '^tabn\\d+$'))
out_lst <- lapply(lst1, function(x) subset(x, Ro >=3 | Co == 3)$Red)

It is possible that after subset ting and selecting the 'Red' column, the number of elements may be different.subset化和选择“红色”列之后,元素的数量可能会有所不同。 If the lengths are different, a option is to pad NA at the end for those having lesser number of elements before cbind ing it如果lengths不同,一个选项是在最后填充NA以在cbind之前为那些元素数量较少的元素填充它

mx <- max(lengths(out_lst))
df <- do.call(cbind, lapply(out_lst, `length<-`, mx))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM