简体   繁体   English

r 去掉列名中特殊字符后的部分

[英]r Remove parts of column name after special characters

Problem问题

I have a dataframe where I am trying to rename column entries that have multiple special characters, varying numbers of digits, and also include positive and negative numbers like shown in the example below.我有一个 dataframe,我在其中尝试重命名具有多个特殊字符、不同位数并且还包括正数和负数的列条目,如下例所示。

Name  Number
A     -500--550
B     -600--650
C     -700--750
D     -8000--8500
E     -9000--9500
F     -100-200
G     200-400

These entries are date ranges and the middle hyphen is supposed to indicate "to", so "A" would be read as "negative 500 to negative 550";这些条目是日期范围,中间的连字符应该表示“到”,因此“A”将被读作“负 500 到负 550”; "F" would be read as "negative 100 to (positive) 200"; “F”将被读作“负 100 到(正)200”; and G would be read as "(200 to 400). G 将被读作“(200 到 400)。

Having a "-" in the beginning of many entries, and "--" in the middle and different numbers of digits is making things a bit complicated.在许多条目的开头有一个“-”,在中间有一个“--”和不同的数字使事情变得有点复杂。 For my end results I would like to remove the "to" dash and everything after.对于我的最终结果,我想删除“to”破折号和之后的所有内容。 The end results should look like this:最终结果应如下所示:

Name  Number
A     -500
B     -600
C     -700
D     -8000
E     -9000
F     -100
G      200

A dplyr approach would be great, but I'm not terribly picky as long as it works. dplyr 方法会很棒,但只要它有效,我就不会特别挑剔。

Similar Questions类似问题

I found some similar questions which came close to providing an answer, but the differences in the data sets have caused problems.我发现了一些接近提供答案的类似问题,但数据集的差异导致了问题。

In this example they have differing number of digits after the dot ".", and use gsub to tackle the issue.在这个例子中,他们在点“.”之后有不同的位数,并使用 gsub 来解决这个问题。 Removing characters in column titles after "." 删除列标题中“.”之后的字符

colnames(df) <- gsub("\\..*$", "", colnames(df))

In this other example they had multiple dots "."在另一个例子中,他们有多个点“。” and wanted to delete the last ".".并想删除最后一个“。”。 Remove (or replace) everything after a specified character in R strings 删除(或替换)R 字符串中指定字符后的所有内容

One of the methods used stringr as is shown below.其中一种方法使用了 stringr,如下所示。

library(stringr)
str_remove(x, "\\.[^.]*$")

The problem here is that for many entries, I'd want to remove the second "-" onwards, but that doesn't work for rows "F" or "G"这里的问题是,对于许多条目,我想从第二个“-”开始删除,但这对行“F”或“G”不起作用

str_remove(testing$Number, "\\--[^-]*$")
[1] "-500"     "-600"     "-700"     "-8000"    "-9000"    "-100-200" "200-400" 

Sample Data样本数据

I've provided a sample test set below.我在下面提供了一个示例测试集。

structure(list(Name = c("A", "B", "C", "D", "E", "F", "G"), Number = c("-500--550", 
"-600--650", "-700--750", "-8000--8500", "-9000--9500", "-100-200", 
"200-400")), class = "data.frame", row.names = c(NA, -7L))

I would replace on the pattern -+\d+$ :我会替换模式-+\d+$

testing$Number <- sub("-+\\d+$", "", testing$Number)

Here is a working regex demo .这是一个有效的正则表达式演示

The regex used here says to match:这里使用的正则表达式表示匹配:

  • -+ one or more dashes -+一个或多个破折号
  • \d+ followed by one or more digits \d+后跟一位或多位数字
  • $ the end of the value $值的结尾

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM