[英]for loop to merge two data frames with common column R
我需要根据代码列表在数据集中添加一些缺失值。 我想通过运行一个与列表上的公共列合并相结合的循环来做到这一点。
可能是循环中合并的副本 R或特殊情况。
#load data
data("mtcars")
#add car names
mtcars <- cbind(cars = rownames(mtcars), mtcars)
rownames(mtcars) <- 1:nrow(mtcars)
#add dates and arrange
date <- rep(seq(as.Date("2015-01-02"), by = "month", length.out = 4),times = 8),
mtcars <- cbind(date = date, mtcars)
mtcars <- mtcars %>%
arrange(., date)
#add additional cars
add_cars <- c("renault", "dacia", "benz", "ferrari",
"AC", "Acura", "Aixam", "Alfa",
"Bertone", "Bestune", "Chevrolet",
"Chrysler", "Haima", "Haval", "Hawtai", "Hennessey")
total_cars <- as_tibble(c(unique(mtcars$cars), add_cars))
colnames(total_cars) <- "cars"
#split data on dates, list total cars
car_dates <- split(mtcars, f= mtcars$date)
total_cars <- as.list(total_cars)
#execute loop
results <- vector(mode = "integer", length = length(car_dates))
mylist <- list()
for (i in 1:length(car_dates)){
g <- nrow(car_dates[[i]])
results[i] <- g
if (results[i] < 144){
res <- list(merge(x = car_dates[[i]], y= total_cars,
by = c("cars"), all = T))
mylist <- c(mylist, res)
mydata_full <- as.data.frame(mylist)
}
}
这个循环收获是一个有 48 个 obs 的数据帧。 52 个变量。 这部分是我的目标。 我得到了将缺失的观察结果添加到每个日期的循环,但它传播了数据集。 现在对于每个日期,重复最初的 13 个变量。
我被困在这里,我只想要最初的 13 个变量,而不是长数据。
mydata_full <- as_tibble(mydata_full)
head(mydata_full)
# A tibble: 6 x 52
cars date mpg cyl disp hp drat wt qsec vs am gear carb cars.1 date.1 mpg.1 cyl.1
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <date> <dbl> <dbl>
1 AC NA NA NA NA NA NA NA NA NA NA NA NA AC NA NA NA
2 Acura NA NA NA NA NA NA NA NA NA NA NA NA Acura NA NA NA
3 Aixam NA NA NA NA NA NA NA NA NA NA NA NA Aixam NA NA NA
4 Alfa NA NA NA NA NA NA NA NA NA NA NA NA Alfa NA NA NA
5 AMC Jav~ NA NA NA NA NA NA NA NA NA NA NA NA AMC Ja~ NA NA NA
6 benz NA NA NA NA NA NA NA NA NA NA NA NA benz NA NA NA
# ... with 35 more variables: disp.1 <dbl>, hp.1 <dbl>, drat.1 <dbl>, wt.1 <dbl>, qsec.1 <dbl>, vs.1 <dbl>,
# am.1 <dbl>, gear.1 <dbl>, carb.1 <dbl>, cars.2 <chr>, date.2 <date>, mpg.2 <dbl>, cyl.2 <dbl>, disp.2 <dbl>,
# hp.2 <dbl>, drat.2 <dbl>, wt.2 <dbl>, qsec.2 <dbl>, vs.2 <dbl>, am.2 <dbl>, gear.2 <dbl>, carb.2 <dbl>,
# cars.3 <chr>, date.3 <date>, mpg.3 <dbl>, cyl.3 <dbl>, disp.3 <dbl>, hp.3 <dbl>, drat.3 <dbl>, wt.3 <dbl>,
# qsec.3 <dbl>, vs.3 <dbl>, am.3 <dbl>, gear.3 <dbl>, carb.3 <dbl>
我确信这可以通过更简单的 full_join 来完成,我尝试过但仅在每个日期分别成功地完成了 full_join,我错过了什么?
#after rearranging the classes to tibble
mtcars_short <- mtcars %>%
filter(date == "2015-02-02") %>%
full_join(total_cars, by= c("cars"))
> print(mtcars_short)
# A tibble: 48 x 13
date cars mpg cyl disp hp drat wt qsec vs am gear carb
<date> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2015-02-02 Mazda RX4 Wag 21 6 160 110 3.9 2.88 17.0 0 1 4 4
2 2015-02-02 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
3 2015-02-02 Merc 280 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
4 2015-02-02 Merc 450SLC 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
5 2015-02-02 Fiat 128 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
6 2015-02-02 Dodge Challenger 15.5 8 318 150 2.76 3.52 16.9 0 0 3 2
7 2015-02-02 Fiat X1-9 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1
8 2015-02-02 Ferrari Dino 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
9 NA Mazda RX4 NA NA NA NA NA NA NA NA NA NA NA
10 NA Hornet Sportabout NA NA NA NA NA NA NA NA NA NA NA
我想要一个 192 obs 的 df。 和 13 个变量。 每个唯一日期的含义 (4) 我想要所有的观察结果 (48)。
# A tibble: 48 x 52
cars date mpg cyl disp hp drat wt qsec vs am gear carb
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AC 2015-01-02 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 Acura 2015-01-02 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
3 Aixam 2015-01-02 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
4 Alfa 2015-01-02 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
5 AMC Ja~ 2015-01-02 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
6 benz . . . . . . . . . . . . . .
7 Bertone . . . . . . . . . . . . . .
8 Bestune . . . . . . . . . . . . . .
9 Cadill~ . . . . . and so on . . . . .
10 Camaro~ . . . . . . . . . . . . . .
. date2
. .
. date3
. etc.
.
192
任何输入将不胜感激!
经过几个小时的挖掘,找到了一个解决方案,这太棒了!
我在 Q: Convert a list to a data frame中找到了它。 感谢@mflo-ByeSE 的评论,我在这里找到了解决方案: https://www.r-bloggers.com/2014/06/concatenating-a-list-of-data-frames/
我修改了循环,因此通过添加列表元素将被赋予日期名称
names(res) <- names(car_dates[i])
在循环
我将 output 作为列表删除
mydata_full <- as.data.frame(mylist)
改进的循环和解决方案如下
#loop
results <- vector(mode = "integer", length = length(car_dates))
mylist <- list()
for (i in 1:length(car_dates)){
g <- nrow(car_dates[[i]])
results[i] <- g
if (results[i] < 144){
res <- list(merge(x = car_dates[[i]], y= total_cars,
by = c("cars"), all = T))
names(res) <- names(car_dates[i])
mylist <- c(mylist, res)
}
}
#then
mydata_full <- as_tibble(plyr::ldply(mylist, rbind))
干杯
一个简单的连接可以解决这个问题。 创建一个包含两列的 dataframe。 一个包含所有不同的汽车名称,重复的数字与唯一日期相同,另一个包含不同的日期,每个日期重复不同的汽车数量。
dataframe 上方将如下所示:
date cars
1: 2015-01-02 Mazda RX4
2: 2015-01-02 Mazda RX4 Wag
3: 2015-01-02 Datsun 710
4: 2015-01-02 Hornet 4 Drive
5: 2015-01-02 Hornet Sportabout
---
188: 2015-04-02 Chrysler
189: 2015-04-02 Haima
190: 2015-04-02 Haval
191: 2015-04-02 Hawtai
192: 2015-04-02 Hennessey
然后我们可以在这个表上执行左连接,使用日期和汽车上的 mtcars 数据作为连接键。
下面是尝试过的代码
data("mtcars")
#add car names
mtcars <- cbind(cars = rownames(mtcars), mtcars)
rownames(mtcars) <- 1:nrow(mtcars)
date <- rep(seq(as.Date("2015-01-02"), by = "month", length.out = 4),times = 8)
mtcars <- cbind(date = date, mtcars)
#add additional cars
add_cars <- c("renault", "dacia", "benz", "ferrari",
"AC", "Acura", "Aixam", "Alfa",
"Bertone", "Bestune", "Chevrolet",
"Chrysler", "Haima", "Haval", "Hawtai", "Hennessey")
total_cars <- c(unique(mtcars$cars), add_cars)
total_cars <- data.frame(date = rep(sort(unique(mtcars$date)), each = length(total_cars)), cars = rep(total_cars, length(unique(mtcars$date))))
total_cars <- merge(total_cars, mtcars, by = c('date', 'cars'), all.x = TRUE)
示例 output 行
date cars mpg cyl disp hp drat wt qsec vs am gear carb
183 2015-04-02 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.4 0 0 3 3
184 2015-04-02 Merc 450SL NA NA NA NA NA NA NA NA NA NA NA
185 2015-04-02 Merc 450SLC NA NA NA NA NA NA NA NA NA NA NA
186 2015-04-02 Pontiac Firebird NA NA NA NA NA NA NA NA NA NA NA
187 2015-04-02 Porsche 914-2 NA NA NA NA NA NA NA NA NA NA NA
188 2015-04-02 renault NA NA NA NA NA NA NA NA NA NA NA
189 2015-04-02 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.9 1 1 4 1
190 2015-04-02 Toyota Corona NA NA NA NA NA NA NA NA NA NA NA
191 2015-04-02 Valiant NA NA NA NA NA NA NA NA NA NA NA
192 2015-04-02 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.