[英]How to create a list of data.frame with matched rows and columns in R
Suppose I have two data frames df1
and df2
:假设我有两个数据框df1
和df2
:
set.seed(123)
df1 <- data.frame(id=sample(letters[1:10], 10, replace = F),
x=rnorm(10), y=rnorm(10), z=rnorm(10), u=rnorm(10))
df1
id x y z u
1 f -1.02642090 0.91899661 -1.02412879 -0.07130809
2 i -0.71040656 -0.57534696 0.11764660 1.44455086
3 g 0.25688371 0.60796432 -0.94747461 0.45150405
4 b -0.24669188 -1.61788271 -0.49055744 0.04123292
5 e -0.34754260 -0.05556197 -0.25609219 -0.42249683
6 c -0.95161857 0.51940720 1.84386201 -2.05324722
7 d -0.04502772 0.30115336 -0.65194990 1.13133721
8 j -0.78490447 0.10567619 0.23538657 -1.46064007
9 a -1.66794194 -0.64070601 0.07796085 0.73994751
10 h -0.38022652 -0.84970435 -0.96185663 1.90910357
df2 <- data.frame(id=sample(letters[2:11], 10, replace = F),
x=rnorm(10), y=rnorm(10), z=rnorm(10), v=rnorm(10))
df2
id x y z v
1 j -1.27745077 -0.08868545 -0.56426954 1.84483867
2 e 1.17719205 -1.59548490 0.97031123 -0.98191715
3 c 0.90250583 0.85170932 -0.01863398 2.19600376
4 h -1.26130418 -0.71356081 0.36237035 -0.20466767
5 b 0.83745515 1.06643034 2.01130559 0.97514294
6 i -2.34829031 -0.53624259 -1.17796750 -0.86756612
7 k 0.61097114 0.53591706 -0.75517048 -0.50118759
8 g -0.04786774 -1.82862663 -0.33128448 0.78559116
9 f -2.39919771 -1.81353336 -0.28370270 -2.10224732
10 d -0.01931896 1.37261371 0.31415290 -0.04220493
I would create a list or an object (prefer) with matched common rows (by id) and column names from df1, df2...
such as我会创建一个列表或一个 object (首选),其中包含来自df1, df2...
例如
df_lst
df1
id x y z
1 b -0.4456620 -0.4727914 1.2538149
2 c -1.2650612 -1.9666172 0.1533731
3 d 0.4978505 0.8377870 0.5539177
4 e 1.7869131 -1.6866933 0.6886403
5 f 0.3598138 -0.2179749 -0.2950715
6 g -0.5558411 -0.6250393 0.8215811
7 h 1.2240818 -1.0678237 0.4264642
8 i 0.4007715 -1.0260044 0.8951257
9 j -0.6868529 0.7013559 -1.1381369
df2
id x y z
1 b -1.0700682 0.4120223 -0.279333528
2 c -0.2416898 -0.1524106 -0.778997240
3 d 1.6232025 0.6343621 -0.685706846
4 e 1.2283928 2.1499193 -0.735026156
5 f 0.2760235 -1.3343536 -1.427685784
6 g -1.0489755 0.4958705 0.619283535
7 h -0.5208693 1.2339762 -0.006198262
8 i -0.7729782 -0.9007918 -0.319393809
9 j -0.4682005 -0.2288958 -0.374800093
We can use intersect
to get the common, names
and 'id' from each dataset.我们可以使用intersect
从每个数据集中获取 common、 names
和 'id'。 Then subset
the rows with %in%
and select
the intersect
ing columns然后用%in%
和select
对intersect
列进行subset
化
nm1 <- intersect(names(df1), names(df2))
nm2 <- intersect(df1$id, df2$id)
df1new <- subset(df1, id %in% nm2, select =nm1)
df1new <- df1new[order(df1new$id),]
df2new <- subset(df2, id %in% nm2, select = nm1)
df2new <- df2new[order(df2new$id),]
If there are many datasets, place them in a list
, use Reduce
to get the intersect
ing column names and 'id'如果有很多数据集,将它们放在一个list
,使用Reduce
获取intersect
的列名和 'id'
lst1 <- list(df1, df2)
nm1 <- Reduce(intersect, lapply(lst1, names))
nm2 <- Reduce(intersect, lapply(lst1, `[[`, "id"))
lst2 <- lapply(lst1, subset, subset = id %in% nm2, select = nm1)
If it needs to be order
ed如果需要order
lst2 <- lapply(lst1, function(x) {
x1 <- subset(x, id %in% nm2, select = nm1)
x1 <- x1[order(x1$id),]
row.names(x1) <- NULL
x1
})
-output -输出
lst2
[[1]]
id x y z
1 b -0.4456620 -0.4727914 1.2538149
2 c -1.2650612 -1.9666172 0.1533731
3 d 0.4978505 0.8377870 0.5539177
4 e 1.7869131 -1.6866933 0.6886403
5 f 0.3598138 -0.2179749 -0.2950715
6 g -0.5558411 -0.6250393 0.8215811
7 h 1.2240818 -1.0678237 0.4264642
8 i 0.4007715 -1.0260044 0.8951257
9 j -0.6868529 0.7013559 -1.1381369
[[2]]
id x y z
1 b -1.0700682 0.4120223 -0.279333528
2 c -0.2416898 -0.1524106 -0.778997240
3 d 1.6232025 0.6343621 -0.685706846
4 e 1.2283928 2.1499193 -0.735026156
5 f 0.2760235 -1.3343536 -1.427685784
6 g -1.0489755 0.4958705 0.619283535
7 h -0.5208693 1.2339762 -0.006198262
8 i -0.7729782 -0.9007918 -0.319393809
9 j -0.4682005 -0.2288958 -0.374800093
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.