简体   繁体   English

如何删除特定单词的小节

[英]How to removing subclauses with specific words

Aim: I am trying to get rid of subclauses in various sentences in r which contain the word normal. 目的:我试图摆脱r中包含“正常”一词的各种句子中的子句。 A subclause is defined as being separated by a starting comma and ending in either a full stop or comma. 一条子句定义为以逗号分隔,以句号或逗号结尾。 I want to get rid of the subclause. 我想摆脱该条款。

Input sentences 输入句子

I walked down the hill, which was normal, but I also walked up another hill which was dull.

I looked at him and although he looked normal, he was not normal.

I am fine, but he is not normal, and she is fine and she is normal, but I think her brother is not normal.

Desired output 所需的输出

I walked down the hill but I also walked up another hill which was dull

I looked at him and although he looked normal.

I am fine, and she is fine and she is normal.

Attempt 尝试

gsub(", .*normal.*?(\\.|,|$)\\R*", "", input_string, perl = T, ignore.case = T)

Current output: 电流输出:

I walked down the hill.
I looked at him and although he looked normal.
I am fine.

However if there are many subclauses this doesn't give the intended output mainly because it removed everything from the first comma. 但是,如果有许多子条款,则不会给出预期的输出,主要是因为它删除了第一个逗号中的所有内容。 How do I make it match from the nearest comma to the 'normal' ? 如何使它从最接近的逗号匹配到“正常”?

Your examples and rules are not consistent (see @janos' comment). 您的示例和规则不一致(请参阅@janos的注释)。 For example, you remove the last sub-clause in your last example sentence "but I think her brother is not normal" even though it doesn't end with a period. 例如,您删除了最后一个例句“但我认为她的兄弟不正常”中的最后一个子句,即使它没有以句号结尾。

That aside, the following should get you started: 除此之外,以下内容还可以帮助您入门:

ss <- c(
    "I walked down the hill, which was normal, but I also walked up another hill which was dull",
    "I looked at him and although he looked normal, he was not normal.",
    "I am fine, but he is not normal, and she is fine and she is normal, but I think her brother is not normal");

lapply(ss, function(x) gsub("\\,[a-zA-Z0-9_ ]+[\\,\\.]{1}", "", x));
#[[1]]
#[1] "I walked down the hill but I also walked up another hill which was dull"

#[[2]]
#[1] "I looked at him and although he looked normal"

#[[3]]
#[1] "I am fine and she is fine and she is normal, but I think her brother is not normal"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM