搜索文本文件中的字符串以及上一句和下一句

Question

如果我有搜索条件： She likes to watch tv

输入文件text.txt包含一些句子，例如：

I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.

我想在文本文件中搜索字符串，并返回包含字符串的句子，以及它之前和之后的句子。

输出应如下所示：

She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault.

因此，它输出匹配搜索词之前的句子，包含搜索词的句子和搜索词之后的句子。

Answer 1

这样的事情怎么样：

    string @in = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";
    string phrase = @"She likes to watch tv";


    int startIndex = @in.IndexOf(phrase);
    int endIndex = startIndex + phrase.Length;
    int tmpIndex;

    tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
    if (tmpIndex > -1)
    {
        startIndex = tmpIndex + 1;
        tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
        if (tmpIndex > -1)
        {
            startIndex = tmpIndex + 1;
            tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
            if (tmpIndex > -1)
            {
                startIndex = tmpIndex;
            }
        }
    }

    tmpIndex = @in.IndexOf(".", endIndex);
    if (tmpIndex > -1)
    {
        endIndex = tmpIndex + 1;
        tmpIndex = @in.IndexOf(".", endIndex);
        if (tmpIndex > -1)
        {
            endIndex = tmpIndex + 1;
        }
    }

    Console.WriteLine(@in.Substring(startIndex, endIndex - startIndex).Trim());

我假设您要查找的短语由'。'分隔。 此代码的工作原理是查找短语的索引并查看前一个短语的匹配，并查看后面句子的短语。

Answer 2

这里介绍一种方法：

string content = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";

string input = @"She likes to watch tv";
string curPhrase = string.Empty, prevPhrase = string.Empty, nextPhrase = string.Empty;

char[] delim = new char[] { '.' };
string[] phrases = content.Split(delim, StringSplitOptions.RemoveEmptyEntries);

for(int i=0; i<phrases.Length; i++){
    if(phrases[i].IndexOf(input) != -1){
        curPhrase = phrases[i];
        prevPhrase = phrases[i - 1];
        if (phrases[i + 1] != null)
            nextPhrase = phrases[i + 1];

        break;
    }
}

它首先在句号中分割整个文本. ，将它们存储在一个数组中，然后在数组中搜索输入字符串后，取出当前，上一个和下一个短语。

Answer 3

使用String.IndexOf() （ docs ）将返回文件中字符串的第一个出现位置。 使用此值，您可以删除包含的短语或句子：

int index = paragraph.IndexOf("She likes to watch tv")

那么你将使用index来设置边界和分割（可能在正则表达式中使用大写字母和句号），以拉出任何一边的句子。

Answer 4

您可以使用Regex获取文本：

string text = "I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";

string target = "She likes to watch tv";

string result = Regex.Replace(text, "(?:.*?\\.\\s)?((?:[^.]*?)" + target + "[^.]*?\\.)(?:.*)", "$1");

//result = "She likes to watch tv but really don't know what to say."

参考： http ： //msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace（v = vs。90）.aspx

搜索文本文件中的字符串以及上一句和下一句

问题描述

4 个解决方案

解决方案1
3 2012-06-10 17:28:37

解决方案2
3 2012-06-10 17:54:04

解决方案3
2 2012-06-10 17:11:28

解决方案4
2 2012-06-10 17:40:36

搜索文本文件中的字符串以及上一句和下一句

问题描述

4 个解决方案

解决方案1 3 2012-06-10 17:28:37

解决方案2 3 2012-06-10 17:54:04

解决方案3 2 2012-06-10 17:11:28

解决方案4 2 2012-06-10 17:40:36

解决方案1
3 2012-06-10 17:28:37

解决方案2
3 2012-06-10 17:54:04

解决方案3
2 2012-06-10 17:11:28

解决方案4
2 2012-06-10 17:40:36