简体   繁体   中英

New Column Based on Conditions

To set the scene, I have a set of data where two columns of the data have been mixed up. To give a simple example:

df1 <- data.frame(Name = c("Bob", "John", "Mark", "Will"), City=c("Apple", "Paris", "Orange", "Berlin"), Fruit=c("London", "Pear", "Madrid", "Orange"))
df2 <- data.frame(Cities = c("Paris", "London", "Berlin", "Madrid", "Moscow", "Warsaw"))

As a result, we have two small data sets:

> df1
  Name   City  Fruit
1  Bob  Apple London
2 John  Paris   Pear
3 Mark Orange Madrid
4 Will Berlin Orange

> df2
  Cities
1  Paris
2 London
3 Berlin
4 Madrid
5 Moscow
6 Warsaw

My aim is to create a new column where the cities are in the correct place using df2. I am a bit new to R so I don't know how this would work.

I don't really know where to even start with this sort of a problem. My full dataset is much larger and it would be good to have an efficient method of unpicking this issue!

If the 'City' values are only different. We may loop over the rows, create a logical vector based on the matching values with 'Cities' from 'df2', and concatenate with the rest of the values by getting the matched values second in the order

df1[] <- t(apply(df1, 1, function(x) 
          {
         i1 <- x %in% df2$Cities
          i2 <- !i1
          x1 <- x[i2]
        c(x1[1], x[i1], x1[2])}))

-output

> df1
  Name   City  Fruit
1  Bob London  Apple
2 John  Paris   Pear
3 Mark Madrid Orange
4 Will Berlin Orange

using dplyr package this is a solution, where it looks up the two City and Fruit values in df1, and takes the one that exists in the df2 cities list. if none of the two are a city name, an empty string is returned, you can replace that with anything you prefer.

library(dplyr)
df1$corrected_City <- case_when(df1$City %in% df2$Cities ~ df1$City,
                                df1$Fruit%in% df2$Cities ~ df1$Fruit,
                                TRUE ~ "")

output, a new column created as you wanted with the city name on that row.

> df1
  Name   City  Fruit corrected_City
1  Bob  Apple London         London
2 John  Paris   Pear          Paris
3 Mark Orange Madrid         Madrid
4 Will Berlin Orange         Berlin

Another way is:

library(dplyr)
library(tidyr)

df1 %>% 
  mutate(across(1:3, ~case_when(. %in%  df2$Cities  ~ .), .names = 'new_{col}')) %>%
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ')
 Name   City  Fruit New_Col
1  Bob  Apple London  London
2 John  Paris   Pear   Paris
3 Mark Orange Madrid  Madrid
4 Will Berlin Orange  Berlin

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM