[英]Search a string in text file and also its previous and next sentence
如果我有搜索条件: She likes to watch tv
输入文件text.txt
包含一些句子,例如:
I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.
我想在文本文件中搜索字符串,并返回包含字符串的句子,以及它之前和之后的句子。
输出应如下所示:
She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault.
因此,它输出匹配搜索词之前的句子,包含搜索词的句子和搜索词之后的句子。
这样的事情怎么样:
string @in = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";
string phrase = @"She likes to watch tv";
int startIndex = @in.IndexOf(phrase);
int endIndex = startIndex + phrase.Length;
int tmpIndex;
tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
if (tmpIndex > -1)
{
startIndex = tmpIndex + 1;
tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
if (tmpIndex > -1)
{
startIndex = tmpIndex + 1;
tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
if (tmpIndex > -1)
{
startIndex = tmpIndex;
}
}
}
tmpIndex = @in.IndexOf(".", endIndex);
if (tmpIndex > -1)
{
endIndex = tmpIndex + 1;
tmpIndex = @in.IndexOf(".", endIndex);
if (tmpIndex > -1)
{
endIndex = tmpIndex + 1;
}
}
Console.WriteLine(@in.Substring(startIndex, endIndex - startIndex).Trim());
我假设您要查找的短语由'。'分隔。 此代码的工作原理是查找短语的索引并查看前一个短语的匹配,并查看后面句子的短语。
这里介绍一种方法:
string content = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";
string input = @"She likes to watch tv";
string curPhrase = string.Empty, prevPhrase = string.Empty, nextPhrase = string.Empty;
char[] delim = new char[] { '.' };
string[] phrases = content.Split(delim, StringSplitOptions.RemoveEmptyEntries);
for(int i=0; i<phrases.Length; i++){
if(phrases[i].IndexOf(input) != -1){
curPhrase = phrases[i];
prevPhrase = phrases[i - 1];
if (phrases[i + 1] != null)
nextPhrase = phrases[i + 1];
break;
}
}
它首先在句号中分割整个文本.
,将它们存储在一个数组中,然后在数组中搜索输入字符串后,取出当前,上一个和下一个短语。
您可以使用Regex
获取文本:
string text = "I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";
string target = "She likes to watch tv";
string result = Regex.Replace(text, "(?:.*?\\.\\s)?((?:[^.]*?)" + target + "[^.]*?\\.)(?:.*)", "$1");
//result = "She likes to watch tv but really don't know what to say."
参考: http : //msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace(v = vs。90).aspx
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.