[英]How can I split one text file into multiple *.txt files?
I got a text file file.txt
(12 MB) containing:我得到了一个文本文件
file.txt
(12 MB),其中包含:
something1
something2
something3
something4
(...)
Is there a way to split file.txt
into 12 *.txt files, let's say file2.txt
, file3.txt
, file4.txt
, etc.?有没有办法将
file.txt
拆分为 12 个 *.txt 文件,比如file2.txt
、 file3.txt
、 file4.txt
等?
You can use the Linux Bash core utility split
:您可以使用 Linux Bash 核心实用程序
split
:
split -b 1M -d file.txt file
Note that M
or MB
both are OK but size is different.请注意,
M
或MB
都可以,但大小不同。 MB is 1000 * 1000, M is 1024^2 MB 为 1000 * 1000,M 为 1024^2
If you want to separate by lines you can use -l
parameter.如果要按行分隔,可以使用
-l
参数。
UPDATE更新
a=(`wc -l yourfile`) ; lines=`echo $(($a/12)) | bc -l` ; split -l $lines -d file.txt file
Another solution as suggested by Kirill , you can do something like the following Kirill建议的另一种解决方案,您可以执行以下操作
split -n l/12 file.txt
Note that is l
not one
, split -n
has a few options, like N
, k/N
, l/k/N
, r/N
, r/k/N
.请注意,
l
not one
, split -n
有几个选项,例如N
, k/N
, l/k/N
, r/N
, r/k/N
。
$ split -l 100 input_file output_file
where -l
is the number of lines in each files.其中
-l
是每个文件中的行数。 This will create:这将创建:
CS Pei's answer won't produce .txt files as the OP wants. CS Pei 的答案不会像 OP 想要的那样生成 .txt 文件。 Use:
利用:
split -b=1M -d file.txt file --additional-suffix=.txt
readarray -t lines < file.txt
count=${#lines[@]}
for i in "${!lines[@]}"; do
index=$(( (i * 12 - 1) / count + 1 ))
echo "${lines[i]}" >> "file${index}.txt"
done
awk '{
a[NR] = $0
}
END {
for (i = 1; i in a; ++i) {
x = (i * 12 - 1) / NR + 1
sub(/\..*$/, "", x)
print a[i] > "file" x ".txt"
}
}' file.txt
Unlike split
, this one makes sure that the number of lines are most even.与
split
不同,这确保行数最均匀。
Regardless to what was said in previous answers, on my Ubuntu 16.04 (Xenial Xerus) I had to do:不管之前的回答中说了什么,在我的Ubuntu 16.04 (Xenial Xerus) 上,我必须这样做:
split -b 10M -d system.log system_split.log
Please note the space between -b
and the value.请注意
-b
和值之间的空格。
Try something like this:尝试这样的事情:
awk -vc=1 'NR%1000000==0{++c}{print $0 > c".txt"}' Datafile.txt
for filename in *.txt; do mv "$filename" "Prefix_$filename"; done;
I agree with @CS Pei, however this didn't work for me:我同意@CS Pei,但这对我不起作用:
split -b=1M -d file.txt file
...as the =
after -b
threw it off. ...因为
-b
之后的=
把它扔掉了。 Instead, I simply deleted it and left no space between it and the variable, and used lowercase "m":相反,我只是删除了它,并且在它和变量之间没有空格,并使用小写的“m”:
split -b1m -d file.txt file
And to append ".txt", we use what @schoon said:并附加“.txt”,我们使用@schoon所说的:
split -b=1m -d file.txt file --additional-suffix=.txt
I had a 188.5MB txt file and I used this command [but with -b5m
for 5.2MB files], and it returned 35 split files all of which were txt files and 5.2MB except the last which was 5.0MB.我有一个 188.5MB 的 txt 文件,我使用了这个命令 [但使用
-b5m
处理 5.2MB 文件],它返回了 35 个拆分文件,所有这些文件都是 txt 文件和 5.2MB,除了最后一个是 5.0MB。 Now, since I wanted my lines to stay whole, I wanted to split the main file every 1 million lines, but the split
command didn't allow me to even do -100000
let alone " -1000000
, so large numbers of lines to split will not work.现在,因为我希望我的行保持完整,所以我想每 100 万行拆分一次主文件,但是
split
命令甚至不允许我执行-100000
更不用说 " -1000000
,所以要拆分大量行不管用。
On my Linux system (Red Hat Enterprise 6.9), the split
command does not have the command-line options for either -n
or --additional-suffix
.在我的 Linux 系统(Red Hat Enterprise 6.9)上,
split
命令没有-n
或--additional-suffix
的命令行选项。
Instead, I've used this:相反,我使用了这个:
split -d -l NUM_LINES really_big_file.txt split_files.txt.
where -d
is to add a numeric suffix to the end of the split_files.txt.
其中
-d
是在split_files.txt.
and -l
specifies the number of lines per file. -l
指定每个文件的行数。
For example, suppose I have a really big file like this:例如,假设我有一个非常大的文件,如下所示:
$ ls -laF
total 1391952
drwxr-xr-x 2 user.name group 40 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group 4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt
This file has 100,000 lines, and I want to split it into files with at most 30,000 lines.该文件有 100,000 行,我想将其拆分为最多 30,000 行的文件。 This command will run the split and append an integer at the end of the output file pattern
split_files.txt.
此命令将运行拆分并在输出文件模式
split_files.txt.
. .
$ split -d -l 30000 really_big_file.txt split_files.txt.
The resulting files are split correctly with at most 30,000 lines per file.生成的文件被正确拆分,每个文件最多 30,000 行。
$ ls -laF
total 2783904
drwxr-xr-x 2 user.name group 156 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group 4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt
-rw-r--r-- 1 user.name group 428604626 Sep 14 15:43 split_files.txt.00
-rw-r--r-- 1 user.name group 427152423 Sep 14 15:43 split_files.txt.01
-rw-r--r-- 1 user.name group 427141443 Sep 14 15:43 split_files.txt.02
-rw-r--r-- 1 user.name group 142454325 Sep 14 15:43 split_files.txt.03
$ wc -l *.txt*
100000 really_big_file.txt
30000 split_files.txt.00
30000 split_files.txt.01
30000 split_files.txt.02
10000 split_files.txt.03
200000 total
If each part has the same number of lines, for example 22, here is my solution:如果每个部分的行数相同,例如 22,这是我的解决方案:
split --numeric-suffixes=2 --additional-suffix=.txt -l 22 file.txt file
And you obtain file2.txt with the first 22 lines, file3.txt the 22 next line, etc.并且您获得file2.txt的前 22 行, file3.txt的 22 下一行,依此类推。
Thank @hamruta-takawale, @dror-s and @stackoverflowuser2010感谢@hamruta-takawale、@dror-s 和 @stackoverflowuser2010
My search of how to do this led me here, so I'm posting this here for others too:我对如何做到这一点的搜索把我带到了这里,所以我也在这里为其他人发布这个:
To get all of the contents of the file, split
is the right answer!要获取文件的所有内容,
split
是正确的答案! But, for those looking to just extract a piece of a file, as a sample of the file, use head
or tail
:但是,对于那些只想提取文件的一部分的人,作为文件的样本,请使用
head
或tail
:
# extract just the **first** 100000 lines of /var/log/syslog into
# ~/syslog_sample.txt
head -n 100000 /var/log/syslog > ~/syslog_sample.txt
# extract just the **last** 100000 lines of /var/log/syslog into
# ~/syslog_sample.txt
tail -n 100000 /var/log/syslog > ~/syslog_sample.txt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.