简体   繁体   English

用 R 中的 for 循环重估许多观察结果

[英]Revaluing many observations with a for loop in R

I have a data set where I am looking at longitudinal data for countries.我有一个数据集,我正在查看国家的纵向数据。

master.set <- data.frame(
  Country = c(rep("Afghanistan", 3), rep("Albania", 3)),
  Country.ID = c(rep("Afghanistan", 3), rep("Albania", 3)),
  Year = c(2015, 2016, 2017, 2015, 2016, 2017),
  Happiness.Score = c(3.575, 3.360, 3.794, 4.959, 4.655, 4.644),
  GDP.PPP = c(1766.593, 1757.023, 1758.466, 10971.044, 11356.717, 11803.282), 
  GINI = NA,
  Status = 2,
  stringsAsFactors = F
)

> head(master.set)
      Country  Country.ID Year Happiness.Score   GDP.PPP GINI Status
1 Afghanistan Afghanistan 2015           3.575  1766.593   NA      2
2 Afghanistan Afghanistan 2016           3.360  1757.023   NA      2
3 Afghanistan Afghanistan 2017           3.794  1758.466   NA      2
4     Albania     Albania 2015           4.959 10971.044   NA      2
5     Albania     Albania 2016           4.655 11356.717   NA      2
6     Albania     Albania 2017           4.644 11803.282   NA      2

I created that Country.ID variable with the intent of turning them into numerical values 1:159.我创建了Country.ID变量,目的是将它们转换为 1:159 的数值。 I am hoping to avoid doing something like this to replace the value at each individual observation: master.set$Country.ID <- master.set$Country.ID[master.set$Country.ID == "Afghanistan"] <- 1我希望避免做这样的事情来替换每个单独观察的值: master.set$Country.ID <- master.set$Country.ID[master.set$Country.ID == "Afghanistan"] <- 1

As I implied, there are 159 countries listed in the data set.正如我所暗示的,数据集中列出了 159 个国家/地区。 Because it' longitudinal, there are 460 observations.因为它是纵向的,所以有 460 个观测值。

Is there any way to use a for loop to save me a lot of time?有什么方法可以使用 for 循环来节省我很多时间吗? Here is what I attempted.这是我尝试的。 I made a couple of lists and attempted to use an ifelse command to tell R to label each country the next number.我列出了几个列表并尝试使用ifelse命令告诉R将每个国家/地区标记为下一个数字。 Here is what I have:这是我所拥有的:

#List of country names
N.Countries <- length(unique(master.set$Country))
Country <- unique(master.set$Country) 
Country.ID <- unique(master.set$Country.ID)
CountryList <- unique(master.set$Country)

#For Loop to make Country ID numerically match Country
for (i in 1:460){
  for (j in N.Countries){
    master.set[[Country.ID[i]]] <- ifelse(master.set[[Country[i]]] == CountryList[j], j, master.set$Country)
  }
}

I received this error:我收到此错误:

Error in `[[<-.data.frame`(`*tmp*`, Country.ID[i], value = logical(0)) : 
  replacement has 0 rows, data has 460

Does anyone know how I can accomplish this task?有谁知道我如何完成这项任务? Or will I be stuck using the ifelse command 159 times?还是我会被ifelse命令卡住 159 次?

Thanks!谢谢!

Maybe something like也许像

master.set$Country.ID <- as.numeric(as.factor(master.set$Country.ID))

Or alternatively, using dplyr或者,使用dplyr

library(tidyverse)
master.set <- master.set %>% mutate(Country.ID = as.numeric(as.factor(Country.ID)))

Or this, which creates a new variable Country.ID2 based on a key-value pair between Country.ID and a 1:length(unique(Country)) .或者,它根据Country.ID1:length(unique(Country))之间的键值对创建一个新变量Country.ID2

library(tidyverse)
master.set <- left_join(master.set,
          data.frame( Country = unique(master.set$Country), 
                      Country.ID2 = 1:length(unique(master.set$Country))))
master.set
#>       Country  Country.ID Year Happiness.Score   GDP.PPP GINI Status
#> 1 Afghanistan Afghanistan 2015           3.575  1766.593   NA      2
#> 2 Afghanistan Afghanistan 2016           3.360  1757.023   NA      2
#> 3 Afghanistan Afghanistan 2017           3.794  1758.466   NA      2
#> 4     Albania     Albania 2015           4.959 10971.044   NA      2
#> 5     Albania     Albania 2016           4.655 11356.717   NA      2
#> 6     Albania     Albania 2017           4.644 11803.282   NA      2
#>   Country.ID2
#> 1           1
#> 2           1
#> 3           1
#> 4           2
#> 5           2
#> 6           2
library(dplyr)
df<-data.frame("Country"=c("Afghanistan","Afghanistan","Afghanistan","Albania","Albania","Albania"),
               "Year"=c(2015,2016,2017,2015,2016,2017),
               "Happiness.Score"=c(3.575,3.360,3.794,4.959,4.655,4.644),
               "GDP.PPP"=c(1766.593,1757.023,1758.466,10971.044,11356.717,11803.282),
               "GINI"=NA,
               "Status"=rep(2,6))
df1<-df %>% arrange(Country) %>% mutate(Country_id = group_indices_(., .dots="Country"))
View(df1)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM