简体   繁体   中英

Linux copy next 'n' number of files from one folder to another in bash/python script

We have a 17k files with name like file1.csv, file2.csv, file3.csv...file17000.csv. All this files should be copied from one folder to another. The goal is create Linux bash or python script to copy all this files divided by 'n' number of csv files every 5 minute, and prevent copying 'n' number of files that already copied.

The idea is:

copy file1.csv file2.csv file3.csv  file4.csv  file5.csv to destination_dir
sleep for 300 seconds
copy file6.csv file7.csv file8.csv file9.csv file10.csv to destination_dir
sleep for 300 seconds
...
copy file16996.csv file16997.csv file16998.csv file16999.csv file17000.csv to destination_dir

For small number of files we've been used below script to copy files between 2 ranges:

#!/bin/bash
source_dir='/source_dir'
target_dir='/target_dir'
echo "beginning number:$1"
echo $1
echo "finite number:$2"
echo $2
for f in $(eval ls $source_dir/file{$1..$2}.csv);
do
cp $f $target_dir
done

Can anyone suggest how to correctly point in the script to use next 'n' number of csv files

Any advice and suggestions will be greatly appreciated.

Does this help?

import os, os.path
import time

from shutil import copyfile

def copy_n_files(src, dest, n, start=0):
    for file_num in range(start, start+n):
        copyfile(f"{src}/file{i}.csv", dst)

SRC_DIR = "src"
DEST_DIR = "dest"


num_files = len([f for f in os.listdir(path)if os.path.isfile(os.path.join(path, f)) and f.endswith(".csv")])
step_size = 10 # number of files you want to copy in one go
sleep_time = 300 # nunmber of seconds you want to sleep for

for i in range(0, num_files, step_size):
    copy_n_files(SRC_DIR, DEST_DIR, step_size, i)
    time.sleep(sleep_time)

A bash version, just addapt batch_size variable as you wish

#!/bin/bash
source_dir='/source_dir'
target_dir='/target_dir'

all_csv_files=`ls -1v $source_dir/file*.csv`
batch_size=5
sleep_break=300

file_counter=0
echo Found ${#all_csv_files[@]} files

for f in "${all_csv_files[@]}"
do
    cp $f $target_dir
    let file_counter++
    if [ $file_counter == $batch_size ] 
    then
        echo Take a break `date`
        file_counter=0
        sleep $sleep_break
    fi
done

echo Done

using bash:

max=$(for i in printf file*.csv;do echo $i;done | grep -Eo '[[:digit:]]+' | tail -1)  # Work out the maximum file number
n=5                                                                                   # Set the batch number of files to copy in one go
for ((i=1;i<=max;i=i+$n));                                                            # Loop from one to max file in batches of n
do 
  sleep 300
  p=$(($i+($n-1)));                                                                   # Set the upper limit for batch file copying
  for ((k=i;i<=p;k++));
  do
     cp "file$k.csv" destination_dir                                                  # Copy files using lower and upper limits of files for this pass
  done
done

Finally, we've done it by updating script provided by Marcel. We added while function to read list of files within array, and it works as we expected:

#!/bin/bash
all_csv_files=()
source_dir='/source_dir'
target_dir='/target_dir'
while IFS=  read -r -d $'\0'; do
    all_csv_files+=("$REPLY")
done < <(find $source_dir -name "file*.csv" -print0)

echo ${#all_csv_files[@]}

batch_size=5
sleep_break=60
file_counter=0

echo Found ${#all_csv_files[@]} files

for f in "${all_csv_files[@]}"
do
    cp $f $target_dir
    echo $f
    let file_counter++
    if [ $file_counter == $batch_size ]
    then
        echo "Take a break $(date)"
        file_counter=0
        sleep $sleep_break
    fi
done
echo Done

Thank you all for suggestion!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM