繁体   English   中英

for循环将两个数据帧与公共列R合并

[英]for loop to merge two data frames with common column R

我需要根据代码列表在数据集中添加一些缺失值。 我想通过运行一个与列表上的公共列合并相结合的循环来做到这一点。

可能是循环中合并的副本 R或特殊情况。


#load data
data("mtcars")
#add car names
mtcars <- cbind(cars = rownames(mtcars), mtcars)
rownames(mtcars) <- 1:nrow(mtcars)
#add dates and arrange
date <- rep(seq(as.Date("2015-01-02"), by = "month", length.out = 4),times = 8),
mtcars <- cbind(date = date, mtcars)
mtcars <- mtcars %>% 
  arrange(., date)
#add additional cars
add_cars <- c("renault", "dacia", "benz", "ferrari",
                "AC", "Acura", "Aixam", "Alfa",
                "Bertone", "Bestune", "Chevrolet",
                "Chrysler", "Haima", "Haval", "Hawtai", "Hennessey")
total_cars <- as_tibble(c(unique(mtcars$cars), add_cars))
colnames(total_cars) <-  "cars"
#split data on dates, list total cars
car_dates <- split(mtcars, f= mtcars$date)
total_cars <- as.list(total_cars)

#execute loop
results <- vector(mode = "integer", length = length(car_dates))
mylist <- list()

for (i in 1:length(car_dates)){
  g <- nrow(car_dates[[i]])
  results[i] <- g
  if (results[i] < 144){
    res <- list(merge(x = car_dates[[i]], y= total_cars,
                      by = c("cars"), all = T))
    mylist <- c(mylist, res)
    mydata_full <- as.data.frame(mylist)
  } 
}


这个循环收获是一个有 48 个 obs 的数据帧。 52 个变量。 这部分是我的目标。 我得到了将缺失的观察结果添加到每个日期的循环,但它传播了数据集。 现在对于每个日期,重复最初的 13 个变量。

我被困在这里,我只想要最初的 13 个变量,而不是长数据。


mydata_full <- as_tibble(mydata_full)
head(mydata_full)
# A tibble: 6 x 52
  cars     date         mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb cars.1  date.1     mpg.1 cyl.1
  <chr>    <date>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>   <date>     <dbl> <dbl>
1 AC       NA            NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA AC      NA            NA    NA
2 Acura    NA            NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA Acura   NA            NA    NA
3 Aixam    NA            NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA Aixam   NA            NA    NA
4 Alfa     NA            NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA Alfa    NA            NA    NA
5 AMC Jav~ NA            NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA AMC Ja~ NA            NA    NA
6 benz     NA            NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA benz    NA            NA    NA
# ... with 35 more variables: disp.1 <dbl>, hp.1 <dbl>, drat.1 <dbl>, wt.1 <dbl>, qsec.1 <dbl>, vs.1 <dbl>,
#   am.1 <dbl>, gear.1 <dbl>, carb.1 <dbl>, cars.2 <chr>, date.2 <date>, mpg.2 <dbl>, cyl.2 <dbl>, disp.2 <dbl>,
#   hp.2 <dbl>, drat.2 <dbl>, wt.2 <dbl>, qsec.2 <dbl>, vs.2 <dbl>, am.2 <dbl>, gear.2 <dbl>, carb.2 <dbl>,
#   cars.3 <chr>, date.3 <date>, mpg.3 <dbl>, cyl.3 <dbl>, disp.3 <dbl>, hp.3 <dbl>, drat.3 <dbl>, wt.3 <dbl>,
#   qsec.3 <dbl>, vs.3 <dbl>, am.3 <dbl>, gear.3 <dbl>, carb.3 <dbl>


我确信这可以通过更简单的 full_join 来完成,我尝试过但仅在每个日期分别成功地完成了 full_join,我错过了什么?

#after rearranging the classes to tibble

mtcars_short <- mtcars %>%
  filter(date == "2015-02-02") %>%
  full_join(total_cars, by= c("cars"))

> print(mtcars_short)
# A tibble: 48 x 13
   date       cars                mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <date>     <chr>             <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 2015-02-02 Mazda RX4 Wag      21       6 160     110  3.9   2.88  17.0     0     1     4     4
 2 2015-02-02 Valiant            18.1     6 225     105  2.76  3.46  20.2     1     0     3     1
 3 2015-02-02 Merc 280           19.2     6 168.    123  3.92  3.44  18.3     1     0     4     4
 4 2015-02-02 Merc 450SLC        15.2     8 276.    180  3.07  3.78  18       0     0     3     3
 5 2015-02-02 Fiat 128           32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
 6 2015-02-02 Dodge Challenger   15.5     8 318     150  2.76  3.52  16.9     0     0     3     2
 7 2015-02-02 Fiat X1-9          27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
 8 2015-02-02 Ferrari Dino       19.7     6 145     175  3.62  2.77  15.5     0     1     5     6
 9 NA         Mazda RX4          NA      NA  NA      NA NA    NA     NA      NA    NA    NA    NA
10 NA         Hornet Sportabout  NA      NA  NA      NA NA    NA     NA      NA    NA    NA    NA

我想要一个 192 obs 的 df。 和 13 个变量。 每个唯一日期的含义 (4) 我想要所有的观察结果 (48)。


# A tibble: 48 x 52
   cars    date         mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb 
   <chr>   <date>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 
 1 AC      2015-01-02  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2 Acura   2015-01-02  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 
 3 Aixam   2015-01-02  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2 
 4 Alfa    2015-01-02  17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3 
 5 AMC Ja~ 2015-01-02  14.7     8  440    230  3.23  5.34  17.4     0     0     3     4 
 6 benz    .            .    .    .    .    .    .    .    .    .    .    .     .     .
 7 Bertone .            .    .    .    .    .    .    .    .    .    .    .     .     .
 8 Bestune .            .    .    .    .    .    .    .    .    .    .    .     .     . 
 9 Cadill~ .            .    .    .    .    and so on    .    .    .     .     .  
10 Camaro~ .            .    .    .    .    .    .    .    .    .    .    .     .     .
.          date2
.          .
.          date3
.          etc.
.
192

任何输入将不胜感激!

经过几个小时的挖掘,找到了一个解决方案,这太棒了!

我在 Q: Convert a list to a data frame中找到了它。 感谢@mflo-ByeSE 的评论,我在这里找到了解决方案: https://www.r-bloggers.com/2014/06/concatenating-a-list-of-data-frames/

我修改了循环,因此通过添加列表元素将被赋予日期名称

names(res) <- names(car_dates[i])

在循环

我将 output 作为列表删除

mydata_full <- as.data.frame(mylist)

改进的循环和解决方案如下


#loop
results <- vector(mode = "integer", length = length(car_dates))
mylist <- list()

for (i in 1:length(car_dates)){
  g <- nrow(car_dates[[i]])
  results[i] <- g
  if (results[i] < 144){
    res <- list(merge(x = car_dates[[i]], y= total_cars,
                      by = c("cars"), all = T))
    names(res) <- names(car_dates[i])
    mylist <- c(mylist, res)
  } 
}

#then
mydata_full <- as_tibble(plyr::ldply(mylist, rbind))


干杯

一个简单的连接可以解决这个问题。 创建一个包含两列的 dataframe。 一个包含所有不同的汽车名称,重复的数字与唯一日期相同,另一个包含不同的日期,每个日期重复不同的汽车数量。

dataframe 上方将如下所示:

           date              cars
  1: 2015-01-02         Mazda RX4
  2: 2015-01-02     Mazda RX4 Wag
  3: 2015-01-02        Datsun 710
  4: 2015-01-02    Hornet 4 Drive
  5: 2015-01-02 Hornet Sportabout
  ---                             
188: 2015-04-02          Chrysler
189: 2015-04-02             Haima
190: 2015-04-02             Haval
191: 2015-04-02            Hawtai
192: 2015-04-02         Hennessey

然后我们可以在这个表上执行左连接,使用日期和汽车上的 mtcars 数据作为连接键。

下面是尝试过的代码

data("mtcars")
#add car names
mtcars <- cbind(cars = rownames(mtcars), mtcars)
rownames(mtcars) <- 1:nrow(mtcars)

date <- rep(seq(as.Date("2015-01-02"), by = "month", length.out = 4),times = 8)
mtcars <- cbind(date = date, mtcars)

#add additional cars
add_cars <- c("renault", "dacia", "benz", "ferrari",
          "AC", "Acura", "Aixam", "Alfa",
          "Bertone", "Bestune", "Chevrolet",
          "Chrysler", "Haima", "Haval", "Hawtai", "Hennessey")
total_cars <- c(unique(mtcars$cars), add_cars)

total_cars <- data.frame(date = rep(sort(unique(mtcars$date)), each = length(total_cars)), cars = rep(total_cars, length(unique(mtcars$date))))

total_cars <- merge(total_cars, mtcars, by = c('date', 'cars'), all.x = TRUE)

示例 output 行

          date             cars  mpg cyl  disp  hp drat    wt qsec vs am gear carb
183 2015-04-02       Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.4  0  0    3    3
184 2015-04-02       Merc 450SL   NA  NA    NA  NA   NA    NA   NA NA NA   NA   NA
185 2015-04-02      Merc 450SLC   NA  NA    NA  NA   NA    NA   NA NA NA   NA   NA
186 2015-04-02 Pontiac Firebird   NA  NA    NA  NA   NA    NA   NA NA NA   NA   NA
187 2015-04-02    Porsche 914-2   NA  NA    NA  NA   NA    NA   NA NA NA   NA   NA
188 2015-04-02          renault   NA  NA    NA  NA   NA    NA   NA NA NA   NA   NA
189 2015-04-02   Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.9  1  1    4    1
190 2015-04-02    Toyota Corona   NA  NA    NA  NA   NA    NA   NA NA NA   NA   NA
191 2015-04-02          Valiant   NA  NA    NA  NA   NA    NA   NA NA NA   NA   NA
192 2015-04-02       Volvo 142E 21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM