简体   繁体   English

拆分字符串 C#

[英]Split the string C#

I want to split the string on the basis of characters and string like ( , . ; and or though but etc.).我想根据字符和字符串(如 ( , . ; and or though but等) 拆分字符串。
Original string: "This movie is great. I like the story, acting is nice and direction is perfect but music is not good."原字符串: "This movie is great. I like the story, acting is nice and direction is perfect but music is not good."
Result:结果:
This movie is great
I like the story
acting is nice
direction is perfect
music is not good

I have tried this.我试过这个。

string test = "This movie is great. I like the story, acting is nice and direction is perfect but music is not good.";
var splittC = Regex.Split(test, ",");
foreach(var a in splittC){
    var splittD = Regex.Split(test, "."); 
    foreach(var b in splittD){
       var splittA = Regex.Split(test, "and"); 
    }
}// and so on....

It is taking so much loops.它需要这么多循环。
And if there is no Comma in this string then it will not check other characters.如果此字符串中没有逗号,则不会检查其他字符。 How to solve these problems.如何解决这些问题。 Please help.请帮忙。

String.Split allows a string[] parameter. String.Split允许使用string[]参数。

Try this:尝试这个:

string test = "This movie is great. I like the story, acting is nice and direction is perfect but music is not good.";
var splitVals = test.Split(new string[] { ",", ".", ";", " and ", " or ", " though ", " but ", " etc. "}, StringSplitOptions.RemoveEmptyEntries);

Parsing natural languages is hard because the computer doesn't understand context.解析自然语言很困难,因为计算机不理解上下文。 If they could, we could talk to them as if they were people.如果他们可以的话,我们可以像他们是人一样与他们交谈。

Sometimes the ands and periods in sentences are not separators, and sometimes sentences don't start with capital letters.有时句子中的and和句号不是分隔符,有时句子不以大写字母开头。

iPhones are great, said Mr. Smith.史密斯先生说,iPhone 很棒。

"A one and a two and a three and a four." “一加二,三加四。” sang the musicians.唱歌的音乐家。

To do the job well, I recommend you either为了做好这项工作,我建议你要么

(a) very strictly control the input allowed, or (a) 非常严格地控制允许的输入,或

(b) use a natural language parsing library, such as SharpNLP which is native, or you can call NLTK from C#. (b) 使用自然语言解析库,例如原生的 SharpNLP,或者您可以从 C# 调用 NLTK。 NLTK is probably the best but even it sometimes fails. NLTK 可能是最好的,但有时也会失败。 It's also 5 GB in size due to the training data its machine learning requires.由于其机器学习所需的训练数据,它的大小也为 5 GB。

To make this work you need to parse the sentence with a lexical analyser then process the objects produced.要完成这项工作,您需要使用词法分析器解析句子,然后处理生成的对象。 Example keyword lexical items are "and", "," etc. The rest of the text in the parsed items between the keyword items can then be concatenated and sent to the output.示例关键字词法项是“and”、“,”等。然后可以连接关键字项之间的已解析项中的其余文本并发送到输出。

try using this simple regex i wrote it may be helpful for you:尝试使用我写的这个简单的正则表达式可能对你有帮助:

var splitRegex=@"\.|\,|\;|(?:\sand\s)|(?:\sor\s)|(?:\sthough\s)|(?:\sbut\s)";
var splittC = Regex.Split(test, splitRegex);
...

the results is:结果是: 按正则表达式拆分 it may need some modifications to work in all situations.它可能需要一些修改才能在所有情况下工作。

string test = "This movie is great. I like the story, acting is nice and direction is perfect but music is not good.";
var splitVals = test.Split(new string[] 
{   ",", ".", ";", " and ", " or ",
    " though ", " but ", " etc. "
},StringSplitOptions.RemoveEmptyEntries);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM