简体   繁体   English

基于条件的新列

[英]New Column Based on Conditions

To set the scene, I have a set of data where two columns of the data have been mixed up.为了设置场景,我有一组数据,其中两列数据混合在一起。 To give a simple example:举个简单的例子:

df1 <- data.frame(Name = c("Bob", "John", "Mark", "Will"), City=c("Apple", "Paris", "Orange", "Berlin"), Fruit=c("London", "Pear", "Madrid", "Orange"))
df2 <- data.frame(Cities = c("Paris", "London", "Berlin", "Madrid", "Moscow", "Warsaw"))

As a result, we have two small data sets:结果,我们有两个小数据集:

> df1
  Name   City  Fruit
1  Bob  Apple London
2 John  Paris   Pear
3 Mark Orange Madrid
4 Will Berlin Orange

> df2
  Cities
1  Paris
2 London
3 Berlin
4 Madrid
5 Moscow
6 Warsaw

My aim is to create a new column where the cities are in the correct place using df2.我的目标是使用 df2 创建一个新的列,其中城市位于正确的位置。 I am a bit new to R so I don't know how this would work.我对 R 有点陌生,所以我不知道它是如何工作的。

I don't really know where to even start with this sort of a problem.我真的不知道从哪里开始解决这类问题。 My full dataset is much larger and it would be good to have an efficient method of unpicking this issue!我的完整数据集要大得多,最好有一种有效的方法来解决这个问题!

If the 'City' values are only different.如果“城市”值只是不同。 We may loop over the rows, create a logical vector based on the matching values with 'Cities' from 'df2', and concatenate with the rest of the values by getting the matched values second in the order我们可以循环遍历行,根据来自“df2”的“Cities”的匹配值创建一个逻辑向量,并通过获取顺序中第二个匹配值来连接值的 rest

df1[] <- t(apply(df1, 1, function(x) 
          {
         i1 <- x %in% df2$Cities
          i2 <- !i1
          x1 <- x[i2]
        c(x1[1], x[i1], x1[2])}))

-output -输出

> df1
  Name   City  Fruit
1  Bob London  Apple
2 John  Paris   Pear
3 Mark Madrid Orange
4 Will Berlin Orange

using dplyr package this is a solution, where it looks up the two City and Fruit values in df1, and takes the one that exists in the df2 cities list.使用 dplyr package 这是一个解决方案,它在 df1 中查找两个 City 和 Fruit 值,并采用 df2 城市列表中存在的值。 if none of the two are a city name, an empty string is returned, you can replace that with anything you prefer.如果两者都不是城市名称,则返回一个空字符串,您可以将其替换为您喜欢的任何内容。

library(dplyr)
df1$corrected_City <- case_when(df1$City %in% df2$Cities ~ df1$City,
                                df1$Fruit%in% df2$Cities ~ df1$Fruit,
                                TRUE ~ "")

output, a new column created as you wanted with the city name on that row. output,根据需要在该行上使用城市名称创建的新列。

> df1
  Name   City  Fruit corrected_City
1  Bob  Apple London         London
2 John  Paris   Pear          Paris
3 Mark Orange Madrid         Madrid
4 Will Berlin Orange         Berlin

Another way is:另一种方法是:

library(dplyr)
library(tidyr)

df1 %>% 
  mutate(across(1:3, ~case_when(. %in%  df2$Cities  ~ .), .names = 'new_{col}')) %>%
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ')
 Name   City  Fruit New_Col
1  Bob  Apple London  London
2 John  Paris   Pear   Paris
3 Mark Orange Madrid  Madrid
4 Will Berlin Orange  Berlin

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM