简体   繁体   English

如何将大文本文件拆分为较小的文件?

[英]How can I split a big text file into smaller file?

I have a big file with some text, and I want to split it into smaller files. 我有一个带有一些文本的大文件,我想把它拆分成更小的文件。

In this example, What I do: 在这个例子中,我做了什么:

  1. I open a text file let's say with 10 000 lines into it 我打开一个文本文件,假设有10 000行
  2. I set a number of package=300 here, which means, that's the small file limit, once a small file has 300 lines into it, close it, open a new file for writing for example (package2). 我在这里设置了一个package = 300的数量,这意味着,这是一个小文件限制,一旦一个小文件有300行,关闭它,打开一个新文件进行编写(例如package2)。

  3. Same, as step 2. 与第2步相同。

  4. You already know 你已经知道了

Here is the code from my function that should do that. 这是我的函数中应该执行此操作的代码。 The ideea (what I dont' know) is how to close, and open a new file once it has reached the 300 limit (in our case here). ideea(我不知道)是如何关闭,并在达到300限制后打开一个新文件(在我们的例子中)。

Let me show you what I'm talking about: 让我告诉你我在说什么:

        int nr = 1;
        package=textBox1.Text;//how many lines/file (small file)
        string packnr = nr.ToString();
        string filer=package+"Pack-"+packnr+"+_"+date2+".txt";//name of small file/s
        int packtester = 0;
        int package= 300;
        StreamReader freader = new StreamReader("bigfile.txt");
        StreamWriter pak = new StreamWriter(filer);
        while ((line = freader.ReadLine()) != null)
        {
            if (packtester < package)
            {
                pak.WriteLine(line);//writing line to small file
                packtester++;//increasing the lines of small file
            }
            else if (packtester == package)//in this example, checking if the lines 
                                           //written, got to 300 
            {
                packtester = 0;
                pak.Close();//closing the file
                nr++;//nr++ -> just for file name to be Pack-2;
                packnr = nr.ToString();   
                StreamWriter pak = new StreamWriter(package + "Pack-" + packnr + "+_" + date2 + ".txt");
            }
        }

I get this errors: 我收到这个错误:

Cannot use local variable 'pak' before it is declared 在声明之前不能使用局部变量'pak'

A local variable named 'pak' cannot be declared in this scope because it would give a different meaning to 'pak', which is already used in a 'parent or current' scope to denote something else 名为'pak'的局部变量不能在此范围内声明,因为它会给'pak'赋予不同的含义,'pak'已在'父或当前'范围内用于表示其他内容

Try this: 尝试这个:

public void SplitFile()
{
    int nr = 1;
    int package = 300;
    DateTime date2 = DateTime.Now;
    int packtester = 0;
    using (var freader = new StreamReader("bigfile.txt"))
    {
        StreamWriter pak = null;
        try
        {
            pak = new StreamWriter(GetPackFilename(package, nr, date2), false);
            string line;

            while ((line = freader.ReadLine()) != null)
            {
                if (packtester < package)
                {
                    pak.WriteLine(line); //writing line to small file
                    packtester++; //increasing the lines of small file
                }
                else
                {
                    pak.Flush();
                    pak.Close(); //closing the file
                    packtester = 0;
                    nr++; //nr++ -> just for file name to be Pack-2;
                    pak = new StreamWriter(GetPackFilename(package, nr, date2), false);
                }
            }
        }
        finally
        {
            if(pak != null)
            {
                pak.Dispose();
            }
        }
    }
}

private string GetPackFilename(int package, int nr, DateTime date2)
{
    return string.Format("{0}Pack-{1}+_{2}.txt", package, nr, date2);
}

This code looks like it closes the stream and re-opens a new stream when you hit 300 lines. 此代码看起来像关闭流并在您达到300行时重新打开新流。 What exactly doesn't work in this code? 在这段代码中究竟什么不起作用?

One thing you'll want to add is a final close (probably with a check so it doesn't try to close an already closed stream) in case you don't have an even multiple of 300 lines. 你想要添加的一件事是最后关闭(可能有一个检查,所以它不会尝试关闭已经关闭的流),以防你没有300行的偶数倍。

EDIT: 编辑:

Due to your edit I see your problem. 由于您的编辑,我看到了您的问题。 You don't need to redeclare pak in the last line of code, simply reinitialize it to another streamwriter. 您不需要在最后一行代码中重新声明pak,只需将其重新初始化为另一个编写器。 (I don't remember if that is disposable but if it is you probably should do that before making a new one). (我不记得那是不是一次性的,但是如果它是你可能应该在制作新的之前做到这一点)。

StreamWriter pak = new StreamWriter(package + "Pack-" + packnr + "+_" + date2 + ".txt");

becomes

pak = new StreamWriter(package + "Pack-" + packnr + "+_" + date2 + ".txt");

Logrotate can do this automatically for you. Logrotate可以自动为您执行此操作。 Years have been put into it and it's what people trust to handle their sometimes very large webserver logs. 已经投入了多年,这是人们信任处理他们有时非常大的网络服务器日志的原因。

Note that the code, as written, will not compile because you define the variable pak more than once. 请注意,编写的代码将无法编译,因为您不止一次定义变量pak It should otherwise function, though it has some room for improvement. 它应该起作用,尽管它有一些改进的余地。

When working with files, my suggestion and the general norm is to wrap your code in a using block, which is basically syntactic sugar built on top of a finally clause: 使用文件时,我的建议和一般规范是将代码包装在一个using块中,这基本上是在finally子句之上构建的语法糖:

using (var stream = File.Open("C:\hi.txt"))
{
    //write your code here. When this block is exited, stream will be disposed.
}

Is equivalent to: 相当于:

try
{
    var stream = File.Open(@"C:\hi.txt");
}
finally
{
    stream.Dispose();
}

In addition, when working with files, always prefer opening file streams using very specific permissions and modes as opposed to using the more sparse constructors that assume some default options. 此外,在处理文件时,总是更喜欢使用非常特定的权限和模式打开文件流,而不是使用假设某些默认选项的更稀疏构造函数。 For example: 例如:

var stream = new StreamWriter(File.Open(@"c:\hi.txt", FileMode.CreateNew, FileAccess.ReadWrite, FileShare.Read));

This will guarantee, for example, that files should not be overwritten -- instead, we assume that the file we want to open doesn't exist yet. 这将保证,例如,文件应该不会被覆盖-相反,我们认为我们要打开的文件尚不存在。

Oh, and instead of using the check you perform, I suggest using the EndOfStream property of the StreamReader object. 哦,而不是使用您执行的检查,我建议使用StreamReader对象的EndOfStream属性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM