从R中的数据框中删除空格

Question

I have scraped some data and stored it in a data frame. 我已经抓取了一些数据并将其存储在数据框中。 Some rows contain unwanted information within square brackets. 有些行在方括号内包含不需要的信息。 Example "[N] Team Name". 示例“ [N]团队名称”。 I want to keep just the part containing the team name, so first I use the code below to remove the brackets and any text contained within them 我只想保留包含团队名称的部分，所以首先我使用下面的代码删除方括号和其中包含的任何文本

gsub( " *\\(.*?\\) *", "", x)

This leaves me with " Team Name" (notice the space before the T). 这给我留下了“团队名称”（注意T之前的空格）。 Now I am trying to remove the white space before the T using trimws or the method shown here , but it is not working 现在，我尝试使用微调或此处显示的方法删除T之前的空白，但是它不起作用

could someone please help me with removing the extra white space. 有人可以帮我删除多余的空白吗？

Note: if I write the string containing the space manually and apply trimws on it, it works. 注意：如果我手动编写包含空格的字符串并在其上应用修剪，它将起作用。 However when obtaining the string directly from the data frame it doesnt. 但是，当直接从数据帧中获取字符串时，它不会。 Also when running the code snippet below (where df[1,1] is the same string retreived from the data frame), I get FALSE. 同样，当运行下面的代码片段时（其中df [1,1]是从数据帧检索到的相同字符串），我得到FALSE。 This gives me reason to believe that the string in the data frame is not the same as the manually typed string. 这使我有理由相信数据框中的字符串与手动键入的字符串不同。

" team name" == df[1,1]

Answer 1

你可以试试

gsub( "\\[[^]]*\\]\\W*", "", "[N] Team Name")

Answer 2

You should be able to remove the bracketed piece as well as any following whitespace with a single regex substitution. 您应该能够使用单个正则表达式替换删除方括号以及以下任何空格。 Your regex is correct as-is, and should successfully accomplish this. 您的正则表达式是正确的，应该成功完成此操作。 (Note: I've ignored the unexplained discrepancy between your use of parentheses vs. square brackets in your question. I've assumed square brackets for my answer.) （注意：我忽略了问题中使用括号与方括号之间的无法解释的差异。我以方括号作为答案。）

Strangely, this seems to be a case where the default regex engine is failing, but adding perl=T gets it working: 奇怪的是，这似乎是默认正则表达式引擎失败的情况，但是添加perl=T可以使其正常工作：

x <- '[N] Team Name';
gsub(' *\\[.*?\\] *','',x);
## [1] " Team Name"
gsub(perl=T,' *\\[.*?\\] *','',x);
## [1] "Team Name"

In the past I have run across cases where the default regex engine flakes out, but I have never encountered this with perl=T , so I suggest you use that. 过去，我遇到过默认正则表达式引擎崩溃的情况，但是我从未遇到过perl=T ，因此我建议您使用它。 I really think there is something broken in the default regex implementation. 我真的认为默认正则表达式实现中存在一些问题。

Answer 3

We can use 我们可以用

sub(".*\\]\\s+", "", x)
#[1] "Team Name"

Or just 要不就

sub("\\S+\\s+", "", x)
#[1] "Team Name"

data 数据

x <- '[N] Team Name';

从R中的数据框中删除空格

问题描述

3 个解决方案

解决方案1
3 已采纳 2016-06-06 08:54:06

解决方案2
1 2016-06-06 08:04:16

解决方案3
0 2016-06-06 08:53:22

data 数据

从R中的数据框中删除空格

问题描述

3 个解决方案

解决方案1 3 已采纳 2016-06-06 08:54:06

解决方案2 1 2016-06-06 08:04:16

解决方案3 0 2016-06-06 08:53:22

data 数据

解决方案1
3 已采纳 2016-06-06 08:54:06

解决方案2
1 2016-06-06 08:04:16

解决方案3
0 2016-06-06 08:53:22