简体   繁体   English

根据条件更新R数据框列

[英]Update R dataframe column based on conditions

I am trying to update a dataframe based on a certain condition. 我正在尝试根据特定条件更新数据框。 Here is the sample dataframe. 这是示例数据框。

  fname mname lname
 1   RONALD D VALE
 2   RONALD  VALE
 3   JACK A SMITH
 4   JACK B SMITH
 5   JACK  SMITH

I would like to update the middle names column if the first and last names match. 如果名字和姓氏匹配,我想更新中间名列。 In this example, I would expect the following output. 在此示例中,我期望以下输出。

  fname mname lname
 1   RONALD D VALE
 2   RONALD D VALE
 3   JACK A SMITH
 4   JACK B SMITH
 5   JACK  SMITH

I also do not want to update the table if there are two different middle initials. 如果有两个不同的中间缩写,我也不想更新表。 There are some missing values in the data. 数据中缺少一些值。 So the main aim is to identify and merge multiple entries which are possibly similar. 因此,主要目的是识别并合并可能相似的多个条目。 At the same time, we do not want to introduce erroneous data into the table. 同时,我们不想将错误的数据引入表中。

A tidyverse solution: tidyverse解决方案:

df %>% 
  group_by(fname, lname) %>% 
  mutate(mname_count = n_distinct(mname, na.rm = TRUE)) %>%
  mutate(mname = ifelse(mname_count == 1, unique(na.omit(mname)), mname)) %>%
  select(-mname_count)

An ugly base R solution (assuming you changed your "" to NA ): 一个基于R的丑陋解决方案(假设您将""更改为NA ):

unic<-unique(lolz[,c("fname","lname")])

for (i in 1:nrow(unic)){
  lelz<-lolz[lolz[,"fname"]==unic[i,1] & lolz[,"lname"]==unic[i,2],]$mnam
  if (sum(!is.na(lelz))==1){
    lelz[is.na(lelz)] <- "D"
    lolz[lolz[,"fname"]==unic[i,1] & lolz[,"lname"]==unic[i,2],][,2]<-lelz
  }
}

We can use data.table 我们可以使用data.table

library(data.table)
setDT(df1)[, mname := if(uniqueN(mname[nzchar(mname)])==1) 
                           mname[nzchar(mname)] else mname, .(fname,  lname)]
df1
#    fname mname lname
#1: RONALD     D  VALE
#2: RONALD     D  VALE
#3:   JACK     A SMITH
#4:   JACK     B SMITH
#5:   JACK       SMITH

data 数据

df1 <- structure(list(fname = c("RONALD", "RONALD", "JACK", "JACK", 
 "JACK"), mname = c("D", "", "A", "B", ""), lname = c("VALE", 
 "VALE", "SMITH", "SMITH", "SMITH")), .Names = c("fname", "mname", 
 "lname"), class = "data.frame", row.names = c("1", "2", "3", 
 "4", "5"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM