使用 R (CRAN) 在多列中以特定模式分隔

Question

I use R and I have some problems with the strings and I'm studing them.我使用 R 并且字符串有一些问题，我正在研究它们。

I have a dataset with a column with the following strings:我有一个包含以下字符串的列的数据集：

"PA=135 65 (07:52) TC=36,7 (07:52) Diur.=750 (07:52) SO2=96  1 l/m (07:52) FC=75  r (07:52)"
"PA=120 60 (08:27) TC=36,5 (08:27) Diur.=2500 (28/09/20 11:30) SO2=97 (08:27) FC=74 (08:27)"
"PA=140 80 (08:44) TC=36,2 (08:44) SO2=96 (08:44) FC=71 (08:44) Stick=108 (03/10/20 18:36)" 
"PA=119 83 (08:44) TC=37 (08:44) SO2=95 (08:44) FC=70 (08:44) Stick=158 (08:45)"            
"PA=140 70 (10:12) TC=36 (10:12) SO2=100  Vm 10 l/m (10:12) FC=50  r (10:12)"               
"PA=140 70 (10:12) TC=36 (10:12) SO2=100  Vm 10 l/m (10:12) FC=50  r (10:12)"

I would like to separate this strings as follows (I think this is easier than the following one):我想按如下方式分隔这些字符串（我认为这比以下字符串更容易）：

         [1]     [2]        [3]                     [4]       [5]
#1 PA=135 65 TC=36,7  Diur.=750           SO2=96  1 l/m     FC=75
#2 PA=120 60 TC=36,5 Diur.=2500                  SO2=97     FC=74
#3 PA=140 80 TC=36,2     SO2=96                   FC=71 Stick=108 
#4 PA=119 83   TC=37     SO2=95                   FC=70 Stick=158            
#5 PA=140 70   TC=36    SO2=100 Vm 10 l/m         FC=50        NA

Also, I would like to know if there is a way to set the columns in the following way:另外，我想知道是否可以通过以下方式设置列：

     [PA]  [TC]   [Diur]          [SO2]   [FC]  [Stick]
#1 135 65  36,7      750      96  1 l/m     75       NA
#2 120 60  36,5     2500             97     74       NA
#3 140 80  36,2       NA             96     71      108 
#4 119 83    37       NA             95     70      158            
#5 140 70    36       NA  100 Vm 10 l/m     50       NA

I tried to separate by "(" or ":" or ")" but the outcome doesn't satisfy me and I cannot process the data in the correct way.我试图用“（”或“：”或“）”分隔，但结果并不让我满意，我无法以正确的方式处理数据。

Answer 1

I hope this is what you want:我希望这是你想要的：

df %>% 
  mutate("[PA]" = str_extract(string, "(?<=PA=)\\d+\\s\\d+"),
         "[TC]" = str_extract(string, "(?<=TC=)\\d+(,\\d+)?"),
         "[Diur]" = str_extract(string, "(?<=Diur\\.=)\\d+"),
         "[SO2]" = str_extract(string, "(?<=SO2=)\\d+[^(]*"),
         "[FC]" = str_extract(string, "(?<=FC=)\\d+"),
         "[Stick]" = str_extract(string, "(?<=Stick=)\\d+")) %>%
  mutate(across(matches("[A-Z]"), ~trimws(.))) %>%
  select(-string)
    [PA] [TC] [Diur]           [SO2] [FC] [Stick]
1 135 65 36,7    750      96  1 l/m    75    <NA>
2 120 60 36,5   2500             97    74    <NA>
3 140 80 36,2   <NA>             96    71     108
4 119 83   37   <NA>             95    70     158
5 140 70   36   <NA> 100  Vm 10 l/m    50    <NA>
6 140 70   36   <NA> 100  Vm 10 l/m    50    <NA>

This mainly draws on lookbehind expressions (?<=...) which assert a conditional pattern which must match in order for the target match to match.这主要利用后向表达式(?<=...) ，它断言一个条件模式，该模式必须匹配才能使目标匹配匹配。 For example, with (?<=PA=)\\d+\\s\\d+ you extract those digits-whitespace-digits strings that are preceded by the string PA= .例如，使用(?<=PA=)\\d+\\s\\d+您提取那些以字符串PA=开头的数字-空白-数字字符串。

使用 R (CRAN) 在多列中以特定模式分隔

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-05-13 14:39:44

使用 R (CRAN) 在多列中以特定模式分隔

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-05-13 14:39:44

解决方案1
1 已采纳 2021-05-13 14:39:44