简体   繁体   English

如何将一个文本文件拆分为多个 *.txt 文件?

[英]How can I split one text file into multiple *.txt files?

I got a text file file.txt (12 MB) containing:我得到了一个文本文件file.txt (12 MB),其中包含:

something1
something2
something3
something4
(...)

Is there a way to split file.txt into 12 *.txt files, let's say file2.txt , file3.txt , file4.txt , etc.?有没有办法将file.txt拆分为 12 个 *.txt 文件,比如file2.txtfile3.txtfile4.txt等?

You can use the Linux Bash core utility split :您可以使用 Linux Bash 核心实用程序split

split -b 1M -d  file.txt file

Note that M or MB both are OK but size is different.请注意, MMB都可以,但大小不同。 MB is 1000 * 1000, M is 1024^2 MB 为 1000 * 1000,M 为 1024^2

If you want to separate by lines you can use -l parameter.如果要按行分隔,可以使用-l参数。

UPDATE更新

a=(`wc -l yourfile`) ; lines=`echo $(($a/12)) | bc -l` ; split -l $lines -d  file.txt file

Another solution as suggested by Kirill , you can do something like the following Kirill建议的另一种解决方案,您可以执行以下操作

split -n l/12 file.txt

Note that is l not one , split -n has a few options, like N , k/N , l/k/N , r/N , r/k/N .请注意, l not onesplit -n有几个选项,例如Nk/Nl/k/Nr/Nr/k/N

$ split -l 100 input_file output_file

where -l is the number of lines in each files.其中-l是每个文件中的行数。 This will create:这将创建:

  • output_fileaa输出文件aa
  • output_fileab output_fileab
  • output_fileac output_fileac
  • output_filead输出文件
  • .... ……

CS Pei's answer won't produce .txt files as the OP wants. CS Pei 的答案不会像 OP 想要的那样生成 .txt 文件。 Use:利用:

split -b=1M -d  file.txt file --additional-suffix=.txt

Using Bash :使用重击

readarray -t lines < file.txt
count=${#lines[@]}

for i in "${!lines[@]}"; do
    index=$(( (i * 12 - 1) / count + 1 ))
    echo "${lines[i]}" >> "file${index}.txt"
done

Using AWK :使用AWK

awk '{
    a[NR] = $0
}
END {
    for (i = 1; i in a; ++i) {
        x = (i * 12 - 1) / NR + 1
        sub(/\..*$/, "", x)
        print a[i] > "file" x ".txt"
    }
}' file.txt

Unlike split , this one makes sure that the number of lines are most even.split不同,这确保行数最均匀。

Regardless to what was said in previous answers, on my Ubuntu 16.04 (Xenial Xerus) I had to do:不管之前的回答中说了什么,在我的Ubuntu 16.04 (Xenial Xerus) 上,我必须这样做:

split -b 10M -d  system.log system_split.log

Please note the space between -b and the value.请注意-b和值之间的空格

Try something like this:尝试这样的事情:

awk -vc=1 'NR%1000000==0{++c}{print $0 > c".txt"}' Datafile.txt

for filename in *.txt; do mv "$filename" "Prefix_$filename"; done;

I agree with @CS Pei, however this didn't work for me:我同意@CS Pei,但这对我不起作用:

split -b=1M -d file.txt file

...as the = after -b threw it off. ...因为-b之后的=把它扔掉了。 Instead, I simply deleted it and left no space between it and the variable, and used lowercase "m":相反,我只是删除了它,并且在它和变量之间没有空格,并使用小写的“m”:

split -b1m -d file.txt file

And to append ".txt", we use what @schoon said:并附加“.txt”,我们使用@schoon所说的:

split -b=1m -d file.txt file --additional-suffix=.txt

I had a 188.5MB txt file and I used this command [but with -b5m for 5.2MB files], and it returned 35 split files all of which were txt files and 5.2MB except the last which was 5.0MB.我有一个 188.5MB 的 txt 文件,我使用了这个命令 [但使用-b5m处理 5.2MB 文件],它返回了 35 个拆分文件,所有这些文件都是 txt 文件和 5.2MB,除了最后一个是 5.0MB。 Now, since I wanted my lines to stay whole, I wanted to split the main file every 1 million lines, but the split command didn't allow me to even do -100000 let alone " -1000000 , so large numbers of lines to split will not work.现在,因为我希望我的行保持完整,所以我想每 100 万行拆分一次主文件,但是split命令甚至不允许我执行-100000更不用说 " -1000000 ,所以要拆分大量行不管用。

On my Linux system (Red Hat Enterprise 6.9), the split command does not have the command-line options for either -n or --additional-suffix .在我的 Linux 系统(Red Hat Enterprise 6.9)上, split命令没有-n--additional-suffix的命令行选项。

Instead, I've used this:相反,我使用了这个:

split -d -l NUM_LINES really_big_file.txt split_files.txt.

where -d is to add a numeric suffix to the end of the split_files.txt.其中-d是在split_files.txt. and -l specifies the number of lines per file. -l指定每个文件的行数。

For example, suppose I have a really big file like this:例如,假设我有一个非常大的文件,如下所示:

$ ls -laF
total 1391952
drwxr-xr-x 2 user.name group         40 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group       4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt

This file has 100,000 lines, and I want to split it into files with at most 30,000 lines.该文件有 100,000 行,我想将其拆分为最多 30,000 行的文件。 This command will run the split and append an integer at the end of the output file pattern split_files.txt.此命令将运行拆分并在输出文件模式split_files.txt. . .

$ split -d -l 30000 really_big_file.txt split_files.txt.

The resulting files are split correctly with at most 30,000 lines per file.生成的文件被正确拆分,每个文件最多 30,000 行。

$ ls -laF
total 2783904
drwxr-xr-x 2 user.name group        156 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group       4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt
-rw-r--r-- 1 user.name group  428604626 Sep 14 15:43 split_files.txt.00
-rw-r--r-- 1 user.name group  427152423 Sep 14 15:43 split_files.txt.01
-rw-r--r-- 1 user.name group  427141443 Sep 14 15:43 split_files.txt.02
-rw-r--r-- 1 user.name group  142454325 Sep 14 15:43 split_files.txt.03


$ wc -l *.txt*
    100000 really_big_file.txt
     30000 split_files.txt.00
     30000 split_files.txt.01
     30000 split_files.txt.02
     10000 split_files.txt.03
    200000 total

If each part has the same number of lines, for example 22, here is my solution:如果每个部分的行数相同,例如 22,这是我的解决方案:

split --numeric-suffixes=2 --additional-suffix=.txt -l 22 file.txt file

And you obtain file2.txt with the first 22 lines, file3.txt the 22 next line, etc.并且您获得file2.txt的前 22 行, file3.txt的 22 下一行,依此类推。

Thank @hamruta-takawale, @dror-s and @stackoverflowuser2010感谢@hamruta-takawale、@dror-s 和 @stackoverflowuser2010

My search of how to do this led me here, so I'm posting this here for others too:我对如何做到这一点的搜索把我带到了这里,所以我也在这里为其他人发布这个:

To get all of the contents of the file, split is the right answer!要获取文件的所有内容, split是正确的答案! But, for those looking to just extract a piece of a file, as a sample of the file, use head or tail :但是,对于那些只想提取文件的一部分的人,作为文件的样本,请使用headtail

# extract just the **first** 100000 lines of /var/log/syslog into 
# ~/syslog_sample.txt
head -n 100000 /var/log/syslog > ~/syslog_sample.txt

# extract just the **last** 100000 lines of /var/log/syslog into 
# ~/syslog_sample.txt
tail -n 100000 /var/log/syslog > ~/syslog_sample.txt

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将linux文件拆分为多个文件? - How can I split my linux file into multiple files? 如何将文件拆分为多个文件? - How can I split my file into multiple files? 如何将一个文本文件逐行拆分为两个按定界符分隔的文本文件? - How can I split a text file line by line into 2 text files by delimited column? 根据分隔符将一个文件拆分为多个文件 - Split one file into multiple files based on delimiter 如何将一个大文件拆分为多个文件? - How do I split a huge file into multiple files? 如何将两个或多个文本文件添加到带有列的一个文本文件中? (在Linux或Windows下) - How can I add two or more text files into one text file with columns?? (under Linux or Windows) 如何在Linux服务器上查找和替换HTML,htm,PHP和txt文件中的文本? - How can I find & replace text in HTML, htm, PHP and txt files on a Linux server? 使用“ cat&gt; text.txt”写入文本文件时,如何退格? - How can I backspace while writing to a text file using “cat > text.txt”? 如何将文本文件拆分为多个文件并从行前缀提取文件名? - How to split text file into multiple files and extract filename from line prefix? 如何在 Gnuplot 中分别 plot 2 个文件数据?我有一个文件为“sin.txt”,另一个为“cos.txt”,我想在一张图上分别 plot - How to plot 2 files data separately in Gnuplot ?I have one file as “sin.txt” and other as “cos.txt” and i want to plot them separately on one graph
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM