[英]Separate with specific pattern in multiple columns with R (CRAN)
I use R and I have some problems with the strings and I'm studing them.我使用 R 并且字符串有一些问题,我正在研究它们。
I have a dataset with a column with the following strings:我有一个包含以下字符串的列的数据集:
"PA=135 65 (07:52) TC=36,7 (07:52) Diur.=750 (07:52) SO2=96 1 l/m (07:52) FC=75 r (07:52)"
"PA=120 60 (08:27) TC=36,5 (08:27) Diur.=2500 (28/09/20 11:30) SO2=97 (08:27) FC=74 (08:27)"
"PA=140 80 (08:44) TC=36,2 (08:44) SO2=96 (08:44) FC=71 (08:44) Stick=108 (03/10/20 18:36)"
"PA=119 83 (08:44) TC=37 (08:44) SO2=95 (08:44) FC=70 (08:44) Stick=158 (08:45)"
"PA=140 70 (10:12) TC=36 (10:12) SO2=100 Vm 10 l/m (10:12) FC=50 r (10:12)"
"PA=140 70 (10:12) TC=36 (10:12) SO2=100 Vm 10 l/m (10:12) FC=50 r (10:12)"
I would like to separate this strings as follows (I think this is easier than the following one):我想按如下方式分隔这些字符串(我认为这比以下字符串更容易):
[1] [2] [3] [4] [5]
#1 PA=135 65 TC=36,7 Diur.=750 SO2=96 1 l/m FC=75
#2 PA=120 60 TC=36,5 Diur.=2500 SO2=97 FC=74
#3 PA=140 80 TC=36,2 SO2=96 FC=71 Stick=108
#4 PA=119 83 TC=37 SO2=95 FC=70 Stick=158
#5 PA=140 70 TC=36 SO2=100 Vm 10 l/m FC=50 NA
Also, I would like to know if there is a way to set the columns in the following way:另外,我想知道是否可以通过以下方式设置列:
[PA] [TC] [Diur] [SO2] [FC] [Stick]
#1 135 65 36,7 750 96 1 l/m 75 NA
#2 120 60 36,5 2500 97 74 NA
#3 140 80 36,2 NA 96 71 108
#4 119 83 37 NA 95 70 158
#5 140 70 36 NA 100 Vm 10 l/m 50 NA
I tried to separate by "(" or ":" or ")" but the outcome doesn't satisfy me and I cannot process the data in the correct way.我试图用“(”或“:”或“)”分隔,但结果并不让我满意,我无法以正确的方式处理数据。
I hope this is what you want:我希望这是你想要的:
df %>%
mutate("[PA]" = str_extract(string, "(?<=PA=)\\d+\\s\\d+"),
"[TC]" = str_extract(string, "(?<=TC=)\\d+(,\\d+)?"),
"[Diur]" = str_extract(string, "(?<=Diur\\.=)\\d+"),
"[SO2]" = str_extract(string, "(?<=SO2=)\\d+[^(]*"),
"[FC]" = str_extract(string, "(?<=FC=)\\d+"),
"[Stick]" = str_extract(string, "(?<=Stick=)\\d+")) %>%
mutate(across(matches("[A-Z]"), ~trimws(.))) %>%
select(-string)
[PA] [TC] [Diur] [SO2] [FC] [Stick]
1 135 65 36,7 750 96 1 l/m 75 <NA>
2 120 60 36,5 2500 97 74 <NA>
3 140 80 36,2 <NA> 96 71 108
4 119 83 37 <NA> 95 70 158
5 140 70 36 <NA> 100 Vm 10 l/m 50 <NA>
6 140 70 36 <NA> 100 Vm 10 l/m 50 <NA>
This mainly draws on lookbehind expressions (?<=...)
which assert a conditional pattern which must match in order for the target match to match.这主要利用后向表达式
(?<=...)
,它断言一个条件模式,该模式必须匹配才能使目标匹配匹配。 For example, with (?<=PA=)\\d+\\s\\d+
you extract those digits-whitespace-digits strings that are preceded by the string PA=
.例如,使用
(?<=PA=)\\d+\\s\\d+
您提取那些以字符串PA=
开头的数字-空白-数字字符串。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.