[英]Adding a new column in R from matching multiple columns in two dataframes?
[英]Creating new columns in multiple dataframes in R
在我之前的問題之后,我在 R 中處理了大量數據幀,每個數據幀都有不同的列數。 我想同化這些數據集,以便它們都具有相同數量的列和新添加列的 NA 值。 我寫了一個循環,但我不確定如何更新真實的數據幀。
first_df = data.frame(matrix(rnorm(20), nrow=10))
second_df = data.frame(matrix(rnorm(20), nrow=4))
third_df = data.frame(matrix(rnorm(20), nrow=5))
library(tidyverse)
min_max <- mget(ls(pattern = "_df")) %>%
map_dbl(ncol) %>%
enframe() %>%
arrange(value) %>%
slice(1, n())
min_max
# A tibble: 2 x 2
# name value
# <chr> <dbl>
#1 first_df 2
#2 second_df 5
diff <- setdiff(names(get(min_max$name[2])), names(get(min_max$name[1])))
for (col_name in diff)
# all dataframes whose names contain "_df"
for (df_index in 1:length(ls(pattern = "_df")))
{
# capturing the dataframe
data = get(ls(pattern = "_df")[df_index]);
if (!(col_name %in% names(data)))
{data[,col_name] <- NA}
# I don't know how to update the real datasets
# get(ls(pattern = "_df")[df_index]) <- data
}
我快速查了一下,解決方案是 assign() function。
所以這是你的分配代表。 但我還了解到,將您的數據框收集到一個列表中會很有用,然后您可以更改我認為的列表位置的名稱。
first_df = data.frame(matrix(rnorm(20), nrow=10))
second_df = data.frame(matrix(rnorm(20), nrow=4))
third_df = data.frame(matrix(rnorm(20), nrow=5))
library(tidyverse)
min_max <- mget(ls(pattern = "_df")) %>%
map_dbl(ncol) %>%
enframe() %>%
arrange(value) %>%
slice(1, n())
min_max
diff <- setdiff(names(get(min_max$name[2])), names(get(min_max$name[1])))
for (col_name in diff) {
# all dataframes whose names contain "_df"
for (df_index in 1:length(ls(pattern = "_df"))) {
# capturing the dataframe
data = get(ls(pattern = "_df")[df_index]);
if (!(col_name %in% names(data))) {
data[,col_name] <- NA
assign(ls(pattern = "_df")[df_index], data)
}
# I don't know how to update the real datasets
# get(ls(pattern = "_df")[df_index]) <- data
}
}
這是一個擺脫循環的替代方案; 它使用dplyr::bind_rows()
將最大尺寸的數據幀放在一起,並在需要時填充 NA。
first_df = data.frame(matrix(rnorm(20), nrow=10))
second_df = data.frame(matrix(rnorm(20), nrow=4))
third_df = data.frame(matrix(rnorm(20), nrow=5))
library(tidyverse)
df_names <- ls(pattern = "_df")
df_list <- mget(df_names)
new_df_list <-
df_list %>%
bind_rows(.id = "id") %>% # put together with biggest number of columns
group_split(id) %>% # break down to list
set_names(df_names) %>%
map(., ~ dplyr::select(., -id)) # remove the id column
# save each df back to global environment
list2env(new_df_list, globalenv())
# check
first_df
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.