如何删除 0 个或多个标记（单词），它们可能会按顺序出现或中间有数据出现？

Question

How to extract only the country names from a variable such as the following.如何从如下变量中仅提取国家名称。

tibble::tribble(
    ~country, 
    '{"United States"}', 
    '{NULL}', 
    '{NULL,NULL}', 
    '{"United States",NULL,Netherlands}', 
    '{Germany}', 
    '{Canada}', 
    '{NULL,NULL}', 
    '{Chile,"United States"}', 
    '{NULL,NULL,NULL}', 
    '{NULL,China, NULL}', 
)

NULL can come up sequentially or not and up to 15 time in a single observation. NULL可以连续出现或不出现，一次观察最多出现 15 次。
Countries with multiple words, such as "United States" come up quoted, otherwise they are all unquoted.带有多个单词的国家，例如“美国”，会被引用，否则它们都不会被引用。

It is somewhat easy to do in multiple runs, such as removing all NULL s, then removing the duplicated commas, and then the parenthesis, but I was aiming for a more efficient way of achieving something towards the following:在多次运行中很容易做到，例如删除所有NULL ，然后删除重复的逗号，然后是括号，但我的目标是采用更有效的方法来实现以下目标：

tibble::tribble(
    ~country, 
    'United States', 
    NA, 
    NA, 
    'United States,Netherlands', 
    'Germany', 
    'Canada', 
    NA, 
    'Chile,United States', 
    NA, 
    'China', 
)

Answer 1

A bit brute-force with gsub s, but it works. gsub有点蛮力，但它有效。

dat$out <- gsub("^,|,$", "",
                trimws(gsub('NULL,?|["{}]', '', dat$country)))
dat
# # A tibble: 10 x 2
#    country                                out                        
#    <chr>                                  <chr>                      
#  1 "{\"United States\"}"                  "United States"            
#  2 "{NULL}"                               ""                         
#  3 "{NULL,NULL}"                          ""                         
#  4 "{\"United States\",NULL,Netherlands}" "United States,Netherlands"
#  5 "{Germany}"                            "Germany"                  
#  6 "{Canada}"                             "Canada"                   
#  7 "{NULL,NULL}"                          ""                         
#  8 "{Chile,\"United States\"}"            "Chile,United States"      
#  9 "{NULL,NULL,NULL}"                     ""                         
# 10 "{NULL,China, NULL}"                   "China"

From here, you can replace the empty strings with "" with从这里，您可以用""替换空字符串

dat$out[!nzchar(dat$out)] <- NA

如何删除 0 个或多个标记（单词），它们可能会按顺序出现或中间有数据出现？

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-05-25 13:19:07

如何删除 0 个或多个标记（单词），它们可能会按顺序出现或中间有数据出现？

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-05-25 13:19:07

解决方案1
0 已采纳 2021-05-25 13:19:07