[英]Shell script: segregate multiple files
我在本地目錄〜/ Report中有此文件:
Rep_{ReportType}_{Date}_{Seq}.csv
Rep_0001_20150102_0.csv
Rep_0001_20150102_1.csv
Rep_0102_20150102_0.csv
Rep_0503_20150102_0.csv
Rep_0503_20150102_0.csv
使用shell腳本,
如何從本地目錄中獲取具有固定批處理大小的多個文件?
如何按報告類型將文件分隔/分組(0001個文件分組在一起,0102分組在一起,0503分組等等)
我將為每個組/報告類型生成一個序列文件(使用forqlift)。 輸出將為Report0001.seq,Report0102.seq,Report0503.seq(3個序列文件)。 我將在其中保存到其他目錄。
注意:在序列文件中,鍵是csv的文件名(Rep_0001_20150102.csv),值是文件的內容。 它存儲為[String,BytesWritable]。
這是我的代碼:
1 reportTypes=(0001 0102 8902)
2
3 # collect all files matching expression into an array
4 filesWithDir=(~/Report/Rep_[0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[0-1].csv)
5
6 # take only the first hundred
7 filesWithDir =( "${filesWithDir[@]:0:100}" )
8
9 # files="${filesWithDir[@]##*/}" #### commented out since forqlift cannot create sequence file without the path/to/file
10 # echo ${files[@]}
11
12 shopt -s nullglob
13
14 # Line 21 is commented out since it has a bug. It collects files in
15 # current directory when it should be filtering the "files array" created
16 # in line 7
17
18
19 for i in ${reportTypes[@]}; do
20 printf -v val '%04d' "$i"
21 # files=("Rep_${val}_"*.csv)
# solution to BUG: (filter files array)
groupFiles=( $( for j in ${filesWithDir[@]} ; do echo $j ; done | grep ${val} ) )
22
23 # Generate sequence file for EACH Report Type
24 forqlift create --file="Report${val}.seq" "${groupFiles[@]}"
25 done
(注意:序列文件輸出應該在當前目錄中,而不是在〜/ Report中)
僅采用數組的一個子集很容易:
# collect all files matching expression into an array
files=( ~/Report/Rep_[0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv )
# take only the first hundred
files=( "${files[@]:0:100}" )
第二部分比較棘手:Bash具有關聯數組(“映射”),但是可以存儲在數組中的唯一合法值是字符串-而非其他數組-因此您不能將文件名列表存儲為關聯的值只需輸入一個條目(無需在數組與字符串之間進行序列化,這是一件很棘手的事情,可以安全地完成,因為UNIX中的文件路徑可以包含NUL以外的任何字符,包括換行符)。
因此,最好根據需要生成數組。
shopt -s nullglob # allow a glob to expand to zero arguments
for ((i=1; i<=1000; i++)); do
printf -v val '%04d' "$i" # pad digits: 12 -> 0012
files=( "Rep_${val}_"*.csv ) # collect files that match
## emit NUL-separated list of files, if any were found
#(( ${#files[@]} )) && printf '%s\0' "${files[@]}" >"Reports.$val.txt"
# Create a sequence file with forqlift
forqlift create --file="Reports-${val}.seq" "${files[@]}"
done
如果您真的不想這樣做,那么我們可以將使用namevars進行重定向的內容放在一起:
#!/bin/bash
# This only works with bash 4.3
re='^REP_([[:digit:]]{4})_[[:digit:]]{8}.csv$'
counter=0
for f in *; do
[[ $f =~ $re ]] || continue # skip files not matching regex
if ((++counter > 100)); then break; fi # stop after 100 files
group=${BASH_REMATCH[1]} # retrieve first regex group
declare -g -a "array${group}" # declare an array
declare -n group_arr="array${group}" # redirect group_arr to that array
group_arr+=( "$f" ) # append to the array
done
for varname in "${!array@}"; do
declare -n group_arr="$varname"
## NUL-delimited form
#printf '%s\0' "${group_arr[@]}" \
# >"collection${varname#array}" # write to files named collection0001, etc.
# forqlift sequence file form
forqlift create --file="Reports-${varname#array}.seq" "${group_arr[@]}"
done
我將不再使用shell腳本,而是開始關注perl。
#!/usr/bin/env perl
use strict;
use warnings;
my %groups;
while ( my $filename = glob ( "~/Reports/Rep_*.csv" ) ) {
my ( $group, $id ) = ( $filename =~ m,/Rep_(\d{4})_(\d{8})\.csv$, );
next unless $group; #undefined means it didn't match;
#anything past 100 in a group is discarded:
if ( @{$groups{$group}} < 100 ) {
push ( @{$groups{$group}}, $filename );
}
}
foreach my $group ( keys %groups ) {
print "$group contains:\n";
print join ("\n", @{$groups{$group});
}
另一種選擇是將某些bash命令與regexp一起使用。 見下面的實施
# Explanation:
# ls -p = List all files and directories in local directory by path
# grep -v / = ignore subdirectories
# grep "^Rep_\d{4}_\d{8}\.csv$" = Look for files matching your regexp
# tail -100 = get 100 results
for file in $(ls -p | grep -v / | grep "^Rep_\d{4}_\d{8}\.csv$" | tail -100);
do echo $file;
# Use reg exp to extract the desired sequence
re="^Rep_([[:digit:]]{4})_([[:digit:]]{8}).csv$";
if [[ $name =~ $re ]]; then
sequence = ${BASH_REMATCH[1};
# Didn't end up using date, but in case you want it
# date = ${BASH_REMATCH[2]};
# Just in case the sequence file doesn't exist
if [ ! -f "$sequence" ] ; then
touch "$sequence"
fi
# Output/Concat your filename to the sequence file, which you can
# read in later to do whatever administrative tasks you wish to do
# to them
echo "$file" >> "$sequence"
fi
done;
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.