简体   繁体   English

在C#中读取/写入txt文件

[英]reading/writing to txt file in C#

Beginner here , 初学者在这里

I'm having difficulty understanding how to edit the contents of a txt file in c#. 我很难理解如何在c#中编辑txt文件的内容。 I'm trying to do the following -pseudocode: 我正在尝试执行以下-pseudocode:

foreach word in file.txt
        if ((word.length < 4) || (word.length > 11))
                        delete word from file.txt

What do I need to be doing? 我需要做什么? I know it involves the streamreader/writer classes but I don't get how they work. 我知道它涉及流阅读器/编写器类,但我不知道它们如何工作。

At first glance this seems simple to do using a StreamReader reading the file, splitting on the space and then removing the words that don't meet the length criteria. 乍看之下,使用StreamReader读取文件,在空间上分割然后删除不符合长度标准的单词,似乎很容易做到。 And then using the StreamWriter to write the result back. 然后使用StreamWriter将结果写回。 However with string parsing (word parsing) you run into a bunch of "special" cases where extra processing may be required. 但是,使用字符串解析(单词解析)时,您会遇到一堆“特殊”情况,可能需要额外的处理。

Words are hard to describe in a programming language. 用编程语言很难描述单词。 For example a word may contain puncuation that is part of the word, or it may start \\ end with punction that denotes the end of a sentence, new line etc. 例如,一个单词可能包含作为该单词一部分的标点符号,也可能以\\开头表示句子结尾,新行等的标点符号。

Now that being said lets say we had the following rules. 话虽如此,可以说我们有以下规则。

  • A word contains one or more alphanumeric characters 一个单词包含一个或多个字母数字字符
  • A word may contain the following puncuation. 单词可能包含以下标点符号。 [-,_'] [-,_']
  • A word may be separated by punctuation or a space. 单词可以用标点符号或空格分隔。

Following these rules we can easily read all the text and perform the manipulations you have asked for. 遵循这些规则,我们可以轻松阅读所有文本并执行您要求的操作。 I would start with the word processing first. 我将从字处理开始。 What you can do is create a static class for this. 您可以为此创建一个静态类。 Lets call this class WordProcessor . 让我们将此类称为WordProcessor

Here is commented code on parsing a word based on our rules from a string. 这是根据我们的规则从字符串解析单词的注释代码。

/// <summary>
/// characters that denote a new word
/// </summary>
const string wordSplitPuncuation = ",.!&()[] \"";

/// <summary>
/// Parse a string
/// </summary>
/// <param name="inputString">the string to parse</param>
/// <param name="preservePuncuation">preserve punctuation in the string</param>
/// <returns></returns>
public static IList<string> ParseString(string inputString, bool preservePuncuation)
{
    //create a list to hold our words
    List<string> rebuildWords = new List<string>();

    //the current word
    string currentWord = "";

    //iterate through all characters in a word
    foreach(var character in inputString)
    {
        //is the character is part of the split characters 
        if(wordSplitPuncuation.IndexOf(character) > -1)
        {
            if (currentWord != "")
                rebuildWords.Add(currentWord);
            if (preservePuncuation)
                rebuildWords.Add("" + character);
            currentWord = "";
        }
        //else add the word to the current word
        else
            currentWord += character;
    }
    return rebuildWords;
}

Now the above is pretty basic and if you set the preserve puncuation to true you get the same string back. 现在,上面的代码非常基础,如果将保留标点设置为true,则返回相同的字符串。

The next part of the class will actually be used to remove words that are less than a specific length or greater than a specific length. 该类的下一部分实际上将用于删除小于特定长度或大于特定长度的单词。 This uses the method above to split the word into pieces and evaluate each piece individually against the variables. 这使用上面的方法将单词分成多个部分,并根据变量分别评估每个部分。

/// <summary>
/// Removes words from a string that are greater or less than the supplied lengths
/// </summary>
/// <param name="inputString">the input string to parse</param>
/// <param name="preservePuncuation">flag to preserve the puncation for rebuilding the string</param>
/// <param name="minWordLength">the minimum word length</param>
/// <param name="maxWordLength">the maximum word length</param>
/// <returns></returns>
public static string RemoveWords(string inputString, bool preservePuncuation, int minWordLength, int maxWordLength)
{
    //parse our string into pieces for iteration
    var words = WordProcessor.ParseString(inputString, preservePuncuation);

    //initialize our complete string container
    List<string> completeString = new List<string>();

    //enumerate each word
    foreach (var word in words)
    {
        //does the word index of zero matches our word split (as puncuation is one character)
        if (wordSplitPuncuation.IndexOf(word[0]) > -1)
        {
            //are we preserviing puncuation
            if (preservePuncuation)
                //add the puncuation
                completeString.Add(word);
        }
        //check that the word length is greater or equal to the min length and less than or equal to the max word length
        else if (word.Length >= minWordLength && word.Length <= maxWordLength)
            //add to the complete string list
            completeString.Add(word);
    }
    //return the completed string by joining the completed string contain together, removing all double spaces and triming the leading and ending white spaces
    return string.Join("", completeString).Replace("  ", " ").Trim();
}

Ok so the above method simple runs through and extracts the words that match a certain criteria and preserves the punctuation. 好的,以上方法很简单地贯穿并提取了符合特定条件的单词,并保留了标点符号。 The final piece of the puzzle is reading \\ writing the file to disk. 最后一个难题是读取\\将文件写入磁盘。 For this we can use the StreamReader and StreamWriter . 为此,我们可以使用StreamReaderStreamWriter (Note if you have file access problems you may want to look at the FileStream class). (请注意,如果您遇到文件访问问题,则可能需要查看FileStream类)。

Now the same code below simple reads a file, invokes the methods above and then writes the file back to the original location. 现在,简单下面的相同代码读取文件,调用上面的方法,然后将文件写回到原始位置。

/// <summary>
/// Removes words from a file
/// </summary>
/// <param name="filePath">the file path to parse</param>
/// <param name="preservePuncuation">flag to preserve the puncation for rebuilding the string</param>
/// <param name="minWordLength">the minimum word length</param>
/// <param name="maxWordLength">the maximum word length</param>
public static void RemoveWordsFromAFile(string filePath, bool preservePuncuation, int minWordLength, int maxWordLength)
{


    //our parsed string
    string parseString = "";

    //read the file
    using (var reader = new StreamReader(filePath))
    {
        parseString = reader.ReadToEnd();
    }

    //open a new writer
    using (var writer = new StreamWriter(filePath))
    {
        //parse our string to remove words
        parseString = WordProcessor.RemoveWords(parseString, preservePuncuation, minWordLength, maxWordLength);

        //write our string
        writer.Write(parseString);
        writer.Flush();
    }
}

Now the above code same simple opens the file, parses the file against your parameters and then re-writes the file. 现在,上面的代码同样简单地打开了文件,根据您的参数解析了文件,然后重新编写了文件。

This can be then be reused by simply calling the method directly such as. 然后,可以通过简单地直接调用诸如之类的方法来重用它。

WordProcessor.RemoveWordsFromAFile(@"D:\test.txt", true, 4, 10);

On a final note. 最后一点。 This is by no means the most effective way to handle your request, and by no means built for performance. 这绝不是处理您的请求的最有效方法,也不是为提高性能而构建的。 This is simply a demonstration on how you could parse words out of a file. 这仅仅是关于如何从文件中解析单词的演示。

Cheers 干杯

The concept is going to be more along the lines of: 这个概念将遵循以下原则:

While(there is input to read from the input file)
{
read the input
if(input fits your criteria of shorter than 4 or longer than 11)
   ignore it
else
   write it to output file (which is a new file, NOT the file you read it from)
}

You can use streamreader.readline() 您可以使用streamreader.readline()

I would look into regex to do pattern matching based on the requirements you describe in your question: Here's a good tutorial on regex. 我将根据您在问题中描述的需求研究regex进行模式匹配:这是一个有关regex的很好的教程。 Target the words and replace them with blanks. 定位单词并用空格替换它们。

Combine that with the following post on how to read/write to text files. 将其与以下有关如何读取/写入文本文件的文章结合在一起。 Depending on how large the file is, you might be ok just reading the whole file, remove the words you want to delete, and finally write the whole content back. 根据文件的大小,可以读取整个文件,删除要删除的单词,然后再将整个内容写回即可。 How to both read and write a file in C# 如何在C#中读取和写入文件

If the file is very large you might have to optimize this and read the file in chunks instead. 如果文件很大,则可能必须对此进行优化,然后分块读取文件。

Try this. 尝试这个。

  1. Get the contents of the text file in a string variable. 在字符串变量中获取文本文件的内容。

  2. split the text with space as delimiter to get the words in an array. 用空格作为分隔符分割文本以获取数组中的单词。

  3. then join the words in that array to meet your criteria write back 然后加入该数组中的单词以满足您的条件

    to the text file. 到文本文件。

        var filePath = HttpRuntime.AppDomainAppPath + "your file path";
        if (!File.Exists(filePath))
            return;
        using (var sr = new StreamReader(filePath))
        {
            var text = sr.ReadToEnd();
            if (text.Length < 4 || text.Length > 11)
            {
                using (var sw = new StreamWriter(filePath))
                {
                    sw.Write("");
                }
            }
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM