I have a dataset of one column and 500 rows, for which I would like to extract each line and save it as an individual file, so I end up with 500 files. The data looks like this:
100002
100003
100004
100005
100006
100007
...
and I want each of these numbers in their own file. For my level of coding I can understand maybe doing some like;
awk -F, 'NR==1 {print $0}' wholefile.txt> individual1.txt
might work with manually changing the numbers, but how do I set this up to iterate through each line and also change the file being created so they are uniquely named such as individual1, individual2, etc.
For example opening individual1.txt would show me 100001, but the file name would not be individual10001
If you don't care about the trailing .txt
in the file names you could use the split
command
split -l 1 -d -a 3 wholefile.txt individual
This will create files with sequentially numbered suffix individual000
, individual001
etc. up to the number of lines in wholefile.txt
. The numbers don't depend on the contents of wholefile.txt
.
See man split
-d use numeric suffixes starting at 0, not alphabetic -a, --suffix-length=N generate suffixes of length N (default 2) --numeric-suffixes[=FROM] same as -d, but allow setting the start value -l, --lines=NUMBER put NUMBER lines/records per output file
The option argument -a 3
creates numbers of 3 digits. You might have to change this depending on the number of lines in wholefile.txt
. The leading zeros make sure the files can be sorted in lexicographical order.
If you want to start the numbers with 1 instead of 0, replace -d
with --numeric-suffixes=1
.
If you want to remove the leading zeros you can use a script to rename the files after splitting. You can also append .txt
if necessary.
for file in individual*
do
newname="$(echo $file|sed 's/\([^0]*\)\(0*\)\([0-9]\)/\1\3/').txt"
mv "$file" "$newname"
done
The sed
command searches for three groups
[^0]*
0 or more characters that are not 0
0*
0 or more 0
characters [0-9]
a digit from 0
to 9
and replaces this pattern by the 1st and 3rd group omitting the 2nd group. This works here because the prefix individual
doesn't contain numbers. Otherwise the sed
command would have to be extended.
Something like this
count = 0
for i in `cat wholefile.txt`
do
# or let count=count+1
count=$((count+1))
echo $i >> individual$count.txt
done
Here is a loop over the line numbers together with a sed
command that prints the line. The output is written to the individual files as intended.
for i in $(seq 1 $(wc -l wholefile.txt | grep -o '^ *[0-9]\+')); do
sed -n "${i}p" wholefile.txt > invidividual${i}.txt
done
Note that for 500 files, the output file names won't be properly formatted. You might want to replace the above file name with invidividual$(printf "%03d" ${i}).txt
.
A pure bash solution is
j=0; while read -r line; do echo "$line" > "individual.$((j++)).txt"; done < file
An awk solution would be
awk '{f=sprintf("individual.%0.5d.txt",NR); "print > f; close(f) }' file
A pure split solution
split -l 1 -d -a 5 --additional-suffix ".txt" file individual.
Use while read -r line;
to read file line by line and write to it with echo
user@vmdeb ~ % cat nums.txt
100001
100002
100003
100004
100005
user@vmdeb ~ % while read -r line; do echo "$line" > "$line".txt; done < nums.txt
user@vmdeb ~ % ls
100001.txt 100002.txt 100003.txt 100004.txt 100005.txt nums.txt
user@vmdeb ~ % cat 100001.txt
100001
You can do something like this..
count = 1
cat wholefile.txt | while read line ;
do
echo $line >> individualtextfile_$count.txt
count=$[count+1]
done
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.