繁体   English   中英

搜索文本文件中的字符串以及上一句和下一句

[英]Search a string in text file and also its previous and next sentence

如果我有搜索条件: She likes to watch tv

输入文件text.txt包含一些句子,例如:

I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.

我想在文本文件中搜索字符串,并返回包含字符串的句子,以及它之前和之后的句子。

输出应如下所示:

She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault.

因此,它输出匹配搜索词之前的句子,包含搜索词的句子和搜索词之后的句子。

这样的事情怎么样:

    string @in = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";
    string phrase = @"She likes to watch tv";


    int startIndex = @in.IndexOf(phrase);
    int endIndex = startIndex + phrase.Length;
    int tmpIndex;

    tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
    if (tmpIndex > -1)
    {
        startIndex = tmpIndex + 1;
        tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
        if (tmpIndex > -1)
        {
            startIndex = tmpIndex + 1;
            tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
            if (tmpIndex > -1)
            {
                startIndex = tmpIndex;
            }
        }
    }

    tmpIndex = @in.IndexOf(".", endIndex);
    if (tmpIndex > -1)
    {
        endIndex = tmpIndex + 1;
        tmpIndex = @in.IndexOf(".", endIndex);
        if (tmpIndex > -1)
        {
            endIndex = tmpIndex + 1;
        }
    }

    Console.WriteLine(@in.Substring(startIndex, endIndex - startIndex).Trim());

我假设您要查找的短语由'。'分隔。 此代码的工作原理是查找短语的索引并查看前一个短语的匹配,并查看后面句子的短语。

这里介绍一种方法:

string content = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";

string input = @"She likes to watch tv";
string curPhrase = string.Empty, prevPhrase = string.Empty, nextPhrase = string.Empty;

char[] delim = new char[] { '.' };
string[] phrases = content.Split(delim, StringSplitOptions.RemoveEmptyEntries);

for(int i=0; i<phrases.Length; i++){
    if(phrases[i].IndexOf(input) != -1){
        curPhrase = phrases[i];
        prevPhrase = phrases[i - 1];
        if (phrases[i + 1] != null)
            nextPhrase = phrases[i + 1];

        break;
    }
}

它首先在句号中分割整个文本. ,将它们存储在一个数组中,然后在数组中搜索输入字符串后,取出当前,上一个和下一个短语。

使用String.IndexOf()docs )将返回文件中字符串的第一个出现位置。 使用此值,您可以删除包含的短语或句子:

int index = paragraph.IndexOf("She likes to watch tv")

那么你将使用index来设置边界和分割(可能在正则表达式中使用大写字母和句号),以拉出任何一边的句子。

您可以使用Regex获取文本:

string text = "I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";

string target = "She likes to watch tv";

string result = Regex.Replace(text, "(?:.*?\\.\\s)?((?:[^.]*?)" + target + "[^.]*?\\.)(?:.*)", "$1");

//result = "She likes to watch tv but really don't know what to say."

参考: http//msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace(v = vs。90).aspx

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM