简体   繁体   中英

How to extract rows and save as a text file in linux

I have a dataset of one column and 500 rows, for which I would like to extract each line and save it as an individual file, so I end up with 500 files. The data looks like this:

100002
100003
100004
100005
100006
100007
...

and I want each of these numbers in their own file. For my level of coding I can understand maybe doing some like;

awk -F, 'NR==1 {print $0}'  wholefile.txt> individual1.txt

might work with manually changing the numbers, but how do I set this up to iterate through each line and also change the file being created so they are uniquely named such as individual1, individual2, etc.

For example opening individual1.txt would show me 100001, but the file name would not be individual10001

If you don't care about the trailing .txt in the file names you could use the split command

split -l 1 -d -a 3 wholefile.txt individual

This will create files with sequentially numbered suffix individual000 , individual001 etc. up to the number of lines in wholefile.txt . The numbers don't depend on the contents of wholefile.txt .

See man split

 -d use numeric suffixes starting at 0, not alphabetic -a, --suffix-length=N generate suffixes of length N (default 2) --numeric-suffixes[=FROM] same as -d, but allow setting the start value -l, --lines=NUMBER put NUMBER lines/records per output file 

The option argument -a 3 creates numbers of 3 digits. You might have to change this depending on the number of lines in wholefile.txt . The leading zeros make sure the files can be sorted in lexicographical order.

If you want to start the numbers with 1 instead of 0, replace -d with --numeric-suffixes=1 .


If you want to remove the leading zeros you can use a script to rename the files after splitting. You can also append .txt if necessary.

for file in individual*
do
    newname="$(echo $file|sed 's/\([^0]*\)\(0*\)\([0-9]\)/\1\3/').txt"
    mv "$file" "$newname"
done

The sed command searches for three groups

  • [^0]* 0 or more characters that are not 0
  • 0* 0 or more 0 characters
  • [0-9] a digit from 0 to 9

and replaces this pattern by the 1st and 3rd group omitting the 2nd group. This works here because the prefix individual doesn't contain numbers. Otherwise the sed command would have to be extended.

Something like this

count = 0
for i in `cat wholefile.txt`
do
# or let count=count+1
count=$((count+1))
echo $i >> individual$count.txt
done

Here is a loop over the line numbers together with a sed command that prints the line. The output is written to the individual files as intended.

for i in $(seq 1 $(wc -l wholefile.txt | grep -o '^ *[0-9]\+')); do
    sed -n "${i}p" wholefile.txt > invidividual${i}.txt
done

Note that for 500 files, the output file names won't be properly formatted. You might want to replace the above file name with invidividual$(printf "%03d" ${i}).txt .

A pure bash solution is

j=0; while read -r line; do echo "$line" > "individual.$((j++)).txt"; done < file

An awk solution would be

awk '{f=sprintf("individual.%0.5d.txt",NR); "print > f; close(f) }' file

A pure split solution

split -l 1 -d -a 5 --additional-suffix ".txt" file individual.

Use while read -r line; to read file line by line and write to it with echo

user@vmdeb ~ % cat nums.txt 
100001
100002
100003
100004
100005
user@vmdeb ~ % while read -r line; do echo "$line" > "$line".txt; done < nums.txt
user@vmdeb ~ % ls
100001.txt  100002.txt  100003.txt  100004.txt  100005.txt nums.txt
user@vmdeb ~ % cat 100001.txt 
100001

You can do something like this..


count = 1

cat wholefile.txt | while read line ;
do
    echo $line >> individualtextfile_$count.txt
    count=$[count+1]
done

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM