简体   繁体   English

如何基于R中另一列中的值替换列值?

[英]How to replace column values based on values in another column in R?

I'm trying to develop my R skills after a few years working with Pandas, and have a problem that's got me stumped. 在与Pandas一起工作了几年之后,我试图发展自己的R技能,但遇到了一个让我感到困惑的问题。

I've split a column of data in a dataframe called df that broadly takes the following form: 我已经在称为df的数据帧中拆分了一列数据,该数据帧大致采用以下形式:

"MN - place1 - time"
...
"ST - place2 - time"

I've used the separate function to split the data into three columns and aimed to isolate the middle column as the updated column: 我使用了单独的功能将数据分为三列,目的是将中间列隔离为更新的列:

cleaning_df <- separate(data = data, col = location, into = c("type", 'location', "time_data"), sep = "-")

It takes the form: 它采用以下形式:

type    location    time_data
MN      place1     time
ST      place2      time

Unfortunately, there are typos which mean that hyphens aren't used to separate the first two fields. 不幸的是,这里有错别字,这表示连字符不能用于分隔前两个字段。

For instance: 例如:

"STPlace2 - time"

Which separate can't handle - or I couldn't work out how. 哪一个无法处理-否则我无法解决。

Luckily, there aren't too many mistakes so I'd created a simple lookup table, location_lookup, which I was hoping to use as a dataframe to correct the data. 幸运的是,这里没有太多错误,因此我创建了一个简单的查找表location_lookup,希望将其用作数据框来更正数据。

It's of the form: 其形式为:

Broken_type     Correct_middle
STPlace2        Place2
...             ...

With Pandas, I could write a straightforward, if un-pythonic and un-Pandas, apply function to go line-by-line through the newly-generated 'type' and 'place' columns. 使用Pandas,我可以编写一个简单的(如果是非Python的和非Pandas的)应用函数,以逐行浏览新生成的“ type”和“ place”列。

It would then update values in 'place' where the value in 'type' matched in the look-up. 然后,它将更新“位置”中与“类型”中的值匹配的“位置”中的值。

Is there a neater way of doing this? 有更整齐的方法吗? I've not been able to work through a solution using joins which would clearly be more efficient. 我一直无法通过使用联接的解决方案来工作,这显然会更有效率。

UPDATE: 更新:

The output from the separate function from my example, along with the error would be: 来自我示例的单独函数的输出以及错误将是:

type     place     time
MN       place1    time
ST       place2    time
STPlace2 time      NA

I want to be able to create a function or join to use the look-up table 我希望能够创建一个函数或联接以使用查找表

Broken_type     Correct_middle
STPlace2        Place2
...             ...

to identify that the third row in the above left column is wrong, and replace the value 'time' with 'Place2. 以确定左上列中的第三行是错误的,并将值“ time”替换为“ Place2”。

The eventual output column would then be: 最终的输出列将是:

place
place1
place2
Place2

We can pass regex on extract 我们可以通过regex extract

extract(data, location, into = c("type", "location", "time_data"),
           "(.{2})[^[:alnum:]]*([[:alnum:]]+)\\s+-\\s+(.*)")
#   type location time_data
#1   MN   place1      time
#2   ST   place2      time
#3   ST   Place2      time

data 数据

data <- structure(list(location = c("MN - place1 - time", "ST - place2 - time", 
"STPlace2 - time")), .Names = "location", class = "data.frame", row.names = c(NA, 
-3L))

It's not exactly elegant, but... 它并不完全优雅,但是...

df$location <- sapply(1:length(df$type), function(x){
  if (df$type[x] %in% location_lookup$Broken_type){
    location_lookup$Correct_middle[match(df$type[x], location_lookup$Broken_type)]
  } else {

    df$place[x]
  }

})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM