简体   繁体   English

R:使用带有 grepl 条件的 if 语句处理列中的数据

[英]R: Processing data in columns using if statements with grepl conditioning

longtidue <- c('517W', '595W', '433W', '450E', '659E', '682W', '678W', '546E', '462W', '500W')
latitude <- c('291N','202N', '276N', '269S', '279N', '294N', '252N', '254S', '248N', '258N')
df <- data.frame(latitude, longitude)

is an example of a database that I'm working with that deals with coordinates in latitude and longitude that appear in the format:是我正在使用的一个数据库示例,它处理以以下格式显示的纬度和经度坐标:

240N, 707W, 267S, 130E 240N、707W、267S、130E

I need to process these coordinates so they can be used in a model that takes coordinates in the form:我需要处理这些坐标,以便它们可以在采用以下形式的坐标的模型中使用:

24.0, -70.7, -26.7, 13.0 24.0、-70.7、-26.7、13.0

(In the model, North and East are considered the positive directions.) (在模型中,北和东被视为正方向。)

The goal is to be able to run down the entire column, and to identify if there is either an "N" or an "S" in the cell.目标是能够遍历整个列,并确定单元格中是否有“N”或“S”。 From there I want to remove the letter and then divide the remaining number by either 10 or -10 to give it the correct sign.从那里我想删除字母,然后将剩余的数字除以 10 或 -10 以给出正确的符号。 If neither N or S appear in the column, I want the code to leave the cell alone, which is the reasoning for the else statement at the end of the sample code I've posted below.如果 N 或 S 都没有出现在列中,我希望代码单独保留单元格,这就是我在下面发布的示例代码末尾的 else 语句的原因。 In order to process all the data in the columns, I've tried using an elseif statement but I wasn't exactly sure how to get it to work.为了处理列中的所有数据,我尝试使用 elseif 语句,但我不确定如何让它工作。 I ended up at a for loop with if conditions that look like this:我结束了一个带有 if 条件的 for 循环,如下所示:

for (i in 1:nrow(df)) {
  if (grepl("N",df$latitude, fixed = TRUE)) {

    df$latitude <- gsub("N", "",df$latitude) & df$latitude <- df$latitude/(10)

   } else if (grepl("S",df$latitude, fixed = TRUE)) {

    df$latitude <- gsub("S", "",df$latitude) & df$latitude <- as.numeric(df$latitude) & df$latitude <- df$latitude/(-10)

   } else (df$latitude)
}

But this either gives me an error with df$latitude/(10) saying "non-numeric argument to binary operator" from the conversion of the data from character to numeric(?) and/or a warning that the "the condition has length > 1 and only the first element will be used".但这要么给我一个错误,df$latitude/(10) 说“非数字参数到二元运算符”从数据从字符到数字(?)的转换和/或警告“条件有长度> 1 并且只使用第一个元素”。 I'm also very new to R and stack overflow for that matter, so if my code could be formatted better, let me know.我对 R 和堆栈溢出也很陌生,所以如果我的代码可以更好地格式化,请告诉我。

Thanks in advance!提前致谢!

Here is one base R option.这是一个基本的 R 选项。 First we can compute the absolute value of the latitude/longitude value by stripping the last directional character, converting to numeric, and then dividing by 10. Then, we conditionally flip the sign for west and south directions.首先,我们可以通过剥离最后一个方向字符,转换为数字,然后除以 10 来计算纬度/经度值的绝对值。然后,我们有条件地翻转西向和南向的符号。

lng <- as.numeric(sub(".$", "", longitude)) / 10
lng <- ifelse(grepl("[WS]$", longitude), -1.0, 1.0) * lng
lng

[1] -51.7 -59.5 -43.3  45.0  65.9 -68.2 -67.8  54.6 -46.2 -50.0

Data:数据:

longitude <- c('517W', '595W', '433W', '450E', '659E', '682W', '678W', '546E', '462W', '500W')

You've got a few problems:你有几个问题:

  • As I mentioned in comments, you loop over i from 1 to nrow(df) .正如我在评论中提到的,您将i1循环到nrow(df) But you don't mention i inside the loop, so you're running the same code on the same inputs again and again.但是你没有在循环中提到i ,所以你一次又一次地在相同的输入上运行相同的代码。 To use a for loop successfully, you'd need to have a bunch of [i] s in there, to handle each input and output one at a time.要成功使用for循环,您需要有一堆[i] ,一次处理每个输入和输出一个。

  • The loop approach above is complicated by the fact that a column can only have one type, so you can't covert the character or factor column to numeric one row at a time---that has to be all or nothing.上面的循环方法很复杂,因为一列只能有一种类型,因此您不能一次将characterfactor列转换为numeric一行——要么全部要么全无。

  • You seem to have a major misunderstanding with & .您似乎对&有重大误解。 This line of code makes no sense: df$latitude <- gsub("N", "",df$latitude) & df$latitude <- df$latitude/(10) , it's two separate lines joined together with an & .这行代码没有意义: df$latitude <- gsub("N", "",df$latitude) & df$latitude <- df$latitude/(10) ,它是用&连接在一起的两行单独的行。 The A & B doesn't mean "do A and do B ", it means "check if A and B are both true. Return TRUE if they are, and FALSE otherwise. If you want to do A and then do B , just put A on a line and B on the next line A & B并不意味着“做A和做B ”,它的意思是“检查AB是否都为真。如果它们是,则返回TRUE ,否则返回FALSE 。如果你想先做A然后再做B ,只要将A放在一行, B放在下一行

As far as a good solution, as I've been writing this you've already received a nice, short, vectorized (no loop needed) solution from Tim.至于一个好的解决方案,在我写这篇文章的时候,你已经从 Tim 那里收到了一个很好的、简短的、矢量化(不需要循环)的解决方案。 Just do that.就这样做吧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM