简体   繁体   English

用R中的数据框名称的子字符串填充列表中的数据框列

[英]fill column of dataframes within a list with substring of dataframes names in R

I have a list of dataframes that look like this>我有一个看起来像这样的数据框列表>


crops_1990.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(1,2,3),
                                year=NA)

crops_1991.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(4,5,6),
                                year=NA)

crops_1992.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(7,8,9),
                                year=NA)


df_list <- list(crops_1990.tempor, crops_1991.tempor, crops_1992.tempor)

I would like to fill the column 'year' with the year information that is in the name of each df within the list (1990, 1991 and 1992, respectively in this example).我想用列表中每个 df 名称中的年份信息填充“年份”列(在此示例中分别为 1990、1991 和 1992)。

I thought it would be very easy but I'm struggling a lot!我以为这很容易,但我很挣扎!

I've tried stuff like:我尝试过类似的东西:

df_list <- lapply(df_list, function(x) {x$year <- as.character(x$year); x}) 
 
df_list <- lapply(df_list, function(x) {x$year <- substring(names(df_list), 7,10); x}) # add years from object name in list

but nothing seems to work.但似乎没有任何效果。 My expected result would be the dataframes within the list looking like this:我的预期结果是列表中的数据框如下所示:


crops_1990.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(1,2,3),
                                year=c("1990", "1990", "1990"))

crops_1991.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(4,5,6),
                                year=c("1991", "1991", "1991"))

crops_1992.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(7,8,9),
                                year=c("1992", "1992", "1992"))

Using tidyverse ( lst names the list automatically*) you could do:使用tidyverselst自动命名列表*)你可以这样做:

library(tidyverse)

lst(crops_1990.tempor, crops_1991.tempor, crops_1992.tempor) |>
  imap(~ .x |> mutate(year = .y |> str_extract("\\d+")))

Alternatively, you could put all of the objects of your environment containing crops_ into a list using mget and ls (faster if you have many data frames!):或者,您可以使用mgetls将环境中包含crops_的所有对象放入一个列表中(如果您有很多数据框,则速度更快!):

mget(ls(pattern = "crops_")) |>
  imap(~ .x |> mutate(year = .y |> str_extract("\\d+")))

Output:输出:

$crops_1990.tempor
  study_unit cropp area year
1      unit1 crop1    1 1990
2      unit2 crop2    2 1990
3      unit3 crop3    3 1990

$crops_1991.tempor
  study_unit cropp area year
1      unit1 crop1    4 1991
2      unit2 crop2    5 1991
3      unit3 crop3    6 1991

$crops_1992.tempor
  study_unit cropp area year
1      unit1 crop1    7 1992
2      unit2 crop2    8 1992
3      unit3 crop3    9 1992

NB!注意! You should consider to putting your data into a list in the first place when you load your data.在加载数据时,您应该首先考虑将数据放入列表中。 See eg on why: How do I make a list of data frames?参见例如为什么: 如何制作数据框列表?

(*) One of the reasons why your approach isn't working is that the list is not named. (*) 您的方法不起作用的原因之一是该列表未命名。

Another potential way is:另一种可能的方式是:

## Creating list of dataframes
df_list <- list(crops_1990.tempor, crops_1991.tempor, crops_1992.tempor)

## Getting the name of all dataframes stored in R's global environment
names_of_dataframes <- ls.str(mode = "list")

## Inserting the values in Year column
for (i in 1:length(names(which(unlist(eapply(.GlobalEnv,is.data.frame)))))) {
    df_list[[i]]$year = as.numeric(str_extract_all(names(which(unlist(eapply(.GlobalEnv,is.data.frame))))[i], "[0-9]+"))
}

## Unlisting all dataframes from the df_list
for (i in seq(df_list))
      assign(names(which(unlist(eapply(.GlobalEnv,is.data.frame))))[i], df_list[[i]])

Output输出

> crops_1990.tempor
  study_unit cropp area year
1      unit1 crop1    1 1990
2      unit2 crop2    2 1990
3      unit3 crop3    3 1990
> crops_1991.tempor
  study_unit cropp area year
1      unit1 crop1    7 1991
2      unit2 crop2    8 1991
3      unit3 crop3    9 1991
> crops_1992.tempor
  study_unit cropp area year
1      unit1 crop1    4 1992
2      unit2 crop2    5 1992
3      unit3 crop3    6 1992

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM