简体   繁体   English

R ifelse()计算条件并返回匹配项

[英]R ifelse() evaluates a condition and returns match

I have a dataframe 我有一个数据框

countryname <- c("Viet Nam", "Viet Nam", "Viet Nam", "Viet Nam", "Viet Nam")
year <- c(1974, 1975, 1976, 1977,1978)

df <- data.frame(countryname, year)

that is in a long country by year format. 按国家/地区格式划分的年份较长。

I would like to create a function that can standardize countrynames conditional upon the year of the observation. 我想创建一个函数,该函数可以根据观察年份对国名进行标准化。 I created a function that is able to pull from a data frame cnames and standardize names but this is only useful for cross-sections and if country names do not vary over time. 我创建了一个函数,该函数能够从数据框的cnames名称并进行标准化,但这仅对横截面以及国家名随时间变化没有帮助。

country <- c("Vietnam, North", "Vietnam, N.", "Vietnam North", "Viet Nam", "Democratic Republic Of Vietnam")
standardize <- c("Vietnam, Democratic Republic of", "Vietnam, Democratic Republic of", "Vietnam, Democratic Republic of", "Vietnam, Democratic Republic of", "Vietnam, Democratic Republic of")
panel <- c("Vietnam", "Vietnam","Vietnam","Vietnam","Vietnam")
time <- c(1976,1976,1976,1976,1976)

cnames <- data.frame(country, standardize, panel, time)

The function to standardize is 标准化的功能是

country_name <- function(x) {
   return(cnames[match(x,cnames$country),]$standardize)
}

However, as you can see this doesn't account for any variation of country names over time. 但是,您可以看到,这并不能说明国家/地区名称随时间的变化。 I've tried a number of different things and the closest I've come is this function. 我尝试了许多不同的方法,而最接近的是此函数。

country_panel <- function(x, y) {

  ifelse(cnames$time < y, 
    return(cnames[match(x, cnames$country),]$panel),
    return(cnames[match(x, cnames$country),]$standardize)
  )
}

I use a dplyr chain to pull in the data frame and then use mutate to create a new variable that ideally that captures the difference in names for countries. 我使用dplyr链插入数据框,然后使用mutate创建一个新变量,该变量理想地捕获了国家名称的差异。

d1 <- df %>%
    mutate(new_name = country_panel(countryname, year))

The problem that I'm finding is that the function only evaluates y in the country_panel function as a single object not as a vector. 我发现的问题是该函数仅将country_panel函数中的y作为单个对象而不是向量进行求值。 If I input an integer that is greater or less than cnames$time it evaluates correctly but passes the value for every row. 如果我输入一个大于或小于cnames$time的整数,则它的计算正确,但会为每一行传递该值。

How can I have this function evaluate each cnames$country and cnames$time relationship to df$year and return the correct cnames$panel or cnames$standardize ? 如何让此函数评估与df$year每个cnames$countrycnames$time关系,并返回正确的cnames$panelcnames$standardize

Thank you for any help. 感谢您的任何帮助。

d1
#   countryname year                        new_name
# 1    Viet Nam 1974 Vietnam, Democratic Republic of
# 2    Viet Nam 1975 Vietnam, Democratic Republic of
# 3    Viet Nam 1976 Vietnam, Democratic Republic of
# 4    Viet Nam 1977                         Vietnam
# 5    Viet Nam 1978                         Vietnam

All you need to do is make sure your data frames are set to stringsAsFactors=F when you define them, ie ( df <- data.frame(countryname, year, stringsAsFactors=F) ). 您需要做的就是确保在定义数据框时将其设置为stringsAsFactors=F ,即( df <- data.frame(countryname, year, stringsAsFactors=F) )。 And take out the return command: 并取出return命令:

country_panel <- function(x, y) {
  ifelse(cnames$time < y, 
    cnames[match(x, cnames$country),]$panel,
    cnames[match(x, cnames$country),]$standardize
  )
}

The reasoning behind it is that return stops the function in its tracks once it's called. 其背后的原因是,一旦调用return该函数便停止了其运行。 So your data frame is being populated by a single value output. 因此,您的数据框由单个值输出填充。 That's why they were all the same. 这就是为什么它们都一样的原因。

You could join the tables based on year and country name: 您可以根据年份和国家/地区名称加入表格:

left_join(df, cnames, by = c("countryname" = "country", "year" = "time"))

  countryname year                     standardize   panel
1    Viet Nam 1974                            <NA>    <NA>
2    Viet Nam 1975                            <NA>    <NA>
3    Viet Nam 1976 Vietnam, Democratic Republic of Vietnam
4    Viet Nam 1977                            <NA>    <NA>
5    Viet Nam 1978                            <NA>    <NA>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM