[英]How to check if a file exists without creating a race condition in a bash script?
如果我錯了,請糾正我,但我所知道的以及我對競爭條件和 TOCTOU(檢查時間和使用時間)錯誤的理解,以這種方式檢查文件是否存在:
if [ -f /path/to/file ]; then
#File exists do some operations on it
fi
創建競爭條件和 TOCTOU 錯誤。 那么有沒有其他方法可以在不創建競爭條件的情況下檢查文件或目錄是否存在,或者如果文件不存在則嘗試打開文件並處理錯誤。
我知道在大多數腳本中使用以前的方法可能不是那么重要,但對我來說,最好練習避免這種情況。
謝謝你的幫助。
為避免競爭條件,您可以將文件重命名為第一步鎖定條件。 在許多文件系統上,這是一個不能同時完成的“原子”操作(一次 inode 寫入)。
這樣,如果重命名成功,您可以確定該文件存在並且您的其他進程都沒有使用它的原始名稱。
例如,使用當前進程 PID 重命名文件:
mv /path/to/file path/to/file.$$
if [ $? = 0 ] ; then
# Success, we can work on path/to/file.$$, and we're then the only one to do so from
# our processes point of view.
cat path/to/file.$$ # doing something with the file
# At the end, we can rename/move the file as 'processed'
mv path/to/file.$$ processed_path/to/file
fi
這樣,您還可以對帶有 PID 號作為擴展名的文件進行恢復過程。
編輯:正如@Thomas 所主張的,這里是這個解決方案的基本實現,作為 bash 腳本, process
。 除非在目錄樹中,例如:
[ `process` current directory ] |-->[input] input directory where the script look for '*.txt' files to process |-->[input_path_etl] input directory where the script will place processed file for ETL
該腳本需要/proc
文件系統進行簡單的進程檢查。 對於垂直可讀性, SC2181尚未應用。
該腳本使用./process
處理文件,並且可以在崩潰時使用./process -r
from its current path進行恢復。 這只是一個例子來說明如何使用 mv lock。 此處對 .txt 文件的處理是將文件中的數據虛構加載到數據庫中的第一步,以及為 ETL 處理器生成虛構文件的第二步。
#!/bin/bash
# process factory paths, should read from a config file, LDAP source, wathever...
process_input_path="input"
process_input_etl_path="input_path_etl"
# Example of an imaginary process that load stdin into a db
load_into_db() {
return 0;
}
# Example of an imaginary process that clean the data in the db for recovery
# Parameters : { filename }
# Returns: 0 successful recover, 1 otherwise
# filename: file path and name which require cleaning in db, mandatory
# stderr: potential cleaning erros
clean_db() {
if [ $# != 1 ] ; then
echo "ERROR: clean_db, wrong parameters" 2>&1
return 1;
fi
return 0;
}
# Example of an imaginary process that load a file into a db
# Parameters: { filename }
# returns: 0 if successfull, 1 if failed
# filename: file's path and name of the file to process, mandatory
# stderr: potential processing errors
process_first_step() {
if [ $# != 1 ] ; then
echo "ERROR: process_first_step wrong parameters" 2>&1
return 1;
fi
# first example step, load things from the file into a db
cat "$1.$$_1" | load_into_db
if [ $? = 0 ] ; then
# rename first the file to means step 1 was succesfully done and
# we go for the second
mv "$1.$$_1" "$1.$$_2"
if [ $? = 0 ] ; then
# success, the file is ready for step 2
return 0;
fi
fi
# If we're here, something went wrong in step 1, exiting with error
return 1;
}
# Example of an imaginary process that put a file into the input path of an ETL
# Parameters: { filename }
# returns: 0 if successfull, 1 if failed
# filename: file's path and name of the file to process, mandatory
# stderr: potential processing errors
process_second_step() {
if [ $# != 1 ] ; then
echo "ERROR: process_second_step wrong parameters" 2>&1
return 1;
fi
# the file is ready for step 2, we create the appropriate input
# for the ETL with some sed transfomration beforehand
cat "$1.$$_2" | sed 's/line/lInE/g' > "$1.$$_2.etl"
if [ $? = 0 ] ; then
# Success, the file is ready for the ETL factory process,
# we move it in with an atomic mv to make it visible from
# the ETL factory process
mv "$1.$$_2.etl" "${process_input_etl_path}/"
if [ $? = 0 ] ; then
# Successful, step 2 is done
return 0;
fi
fi
# If we're here, something went wrong in step 2, exiting with error
return 1;
}
# Example of an imaginary file processor that conducts all the
# required step on the provided file
# Parameters : { filename }
# Returns : 0 if successful, 1 otherwise
# filename : file's path and name of the file to process, mandatory
# stderr: potential processing errors
process_file() {
if [ $# != 1 ] ; then
echo "ERROR: process_file, wrong parameters" 2>&1
return 1;
fi
# Lock the file for processing step one
mv "$1" "$1.$$_1"
if [ $? = 0 ] ; then
# ok we have the file for us
# first example step, load things from the file into a db
process_first_step "$1"
if [ $? = 0 ] ; then
# first step is successful, so continue the process
# next example step, add the loaded lines with transformations into the input path of antoher process factory (like an ETL)
process_second_step "$1"
if [ $? = 0 ] ; then
# Second step is susccesful, we can now rename the file with
# a suffix meaning it was fully processed, a filename that would
# not be visible for the factory process
mv "$1.$$_2" "$1_processed"
if [ $? != 0 ] ; then
# if this failed, we have to return an error,
# the current file name would be $1.$$_2, not visible
# from the process factory and the error message will mean
# that the file was fully processed but can't be renamed
# at the end, so no recovering is required
echo "ERROR: process_file, $1 can't be renamed as fully processed." 2>&1
return 1;
fi
# if we're here, the file was fully processed and rename accordingly,
# we return a success status
return 0;
fi
fi
fi
# If we're here, something went wrong in the process, we exit with an error
# the actual filename will be $1.$$_1 or $1.$$_2 depending of where it was
# in the processing chain, it will not be visible from the main
# process factory and the rcovery process can then process it accordingly
return 1;
}
# Example of an imaginary process recovery for orphan files due to a crash,
# power outage, unexpected reboot, CTRL^C, etc.
# Returns: 0 for success, 1 if error(s)
# stdout: recovery operations infos if any
# stderr: potential error(s)
process_recovery() {
if [ $# != 0 ] ; then
echo "ERROR: process_recovery, wrong parameters." 2>&1
return 1;
fi
# local variables
local process_FILE=""
local process_PID=""
local process_STEP=""
local process_CMD=""
# flag for the file that means :
# 0 : do not recover
# 1 : recover
# 2 : can't recover
# 3 : recover successful, rename to put the file back in the process
# 4 : recover successful, rename it as fully processed
local recover_status=0
# flag to check if the recover process is succesful,
# 0: success
# 1: error(s)
local recovery_status=0
# We can only have one recovery process at a time, check for the corresponding lock, we use an atomic mkdir for that
mkdir "${process_input_path}/recover" &>/dev/null
if [ $? != 0 ] ; then
# if it fails, it means there is probably already a running recover
echo "ERROR: process_recovery, a recovery seems to be still in progress." 2>&1
echo " if there is no more running recovery (crash)," 2>&1
echo " disarm manually the lock by removing the recover folder." 2>&1
echo " Check also that the input folder is writable for script." 2>&1
return 1;
fi
# We first have to check every files in the input path that match
# a *.txt.<PID>_<step> pattern
find "${process_input_path}/" -name '*.txt.[0-9]*_[12]' | ( while read -r file_to_check || exit ${recovery_status}; do
# By default, do not recover
recover_status=0
# Get the PID and check if there is a running corresponding process
process_PID="$(echo "${file_to_check}" | sed 's/^.*\.txt\.\([^_]*\)_[0-9]*$/\1/')"
if [[ $? != 0 || "${process_PID}" = "${file_to_check}" ]] ; then
# Something went wrong, we output an error on stderr and set the flag
echo "ERROR: process_recovery, failed to parse pid from file name ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
else
# We check the shell process through /proc and check it is our
process_CMD="$(cat "/proc/${process_PID}/comm" 2>/dev/null)"
if [[ $? = 0 && "$(echo "${process_CMD}" | grep process.sh)" != "" ]] ; then
# There is a process.sh with the same PID, no recover needed
echo "File ${file_to_check} is processed by PID ${process_PID}..."
else
# There is no corresponding process, but it could have finished during
# our operations, so we check if the file is still here
if [ -e "${file_to_check}" ] ; then
# The file is still here, so we need to recover
echo "XX${process_CMD}"
recover_status=1;
fi
fi
fi
if [ "${recover_status}" = "1" ] ; then
# The file should be recovered, signal it
echo "Recovering file ${file_to_check}..."
# Get the original file name
process_FILE="$(echo "${file_to_check}" | sed 's/^\(.*\.txt\)\.[^_]*_[0-9]*$/\1/')"
if [[ $? != 0 || "${process_FILE}" = "${file_to_check}" ]] ; then
# Something went wrong, we output an error on stderr and set the flag
echo "ERROR: process_recovery, failed to parse original name from file name ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
else
# We need to know at which step it was
process_STEP="$(echo "${file_to_check}" | sed 's/^.*\.txt\.[^_]*_\([0-9]*\)$/\1/')"
if [[ $? != 0 || "${process_STEP}" = "${file_to_check}" ]] ; then
# Something went wrong, we output an error on stderr and set the flag
echo "ERROR: process_recovery, failed to parse step from file name ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
fi
fi
# Still ok to recover ?
if [ "${recover_status}" = "1" ] ; then
# check the step
case "${process_STEP}" in
"1")
# Do database cleaning for the file, we will revert and rename the file
# so it will be processed next by the factory process
clean_db "${file_to_check}"
if [ $? != 0 ]; then
# The cleaning process has failed, signal it
echo "ERROR: process_recovery, failed to clean the db for ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
else
# Cleaning was successful, rename the file so it will be
# visible at new from the process factory
recover_status=3;
fi
;;
"2")
# If the file is still here, check if it is not in the input path of the ETL
# or if the ETL is/has already processing/processed it
if [[ -e "${process_input_etl_path}/${process_FILE}.etl" || -e "${process_input_etl_path}/${process_FILE}.etl_processed" ]] ; then
# The file as fully completed step 2 then and should be marked as processed
recover_status=4;
else
# If the file has not reach the ETL input path, we just have to launch step 2 for the file
# If there is .etl local file, we aren't sure it was completed before crash, so a redo of step will simply overwrite it,
# as it is a local file in the current path, it has never been seen by the ETL
# We rename it for processing with the recovery PID
echo "Recovering ${file_to_check} on step 2 as ${process_FILE}.$$_2..."
mv "${file_to_check}" "${process_FILE}.$$_2"
if [ $? != 0 ]; then
# The renaming failed, signal it. The file is still here, so a future recovery can handle it , nothing more to do
echo "ERROR: process_recovery, failed to rename file ${file_to_check} for step 2" 2>&1
recovery_status=1;
recover_status=2;
else
# File is ready for step 2
process_second_step "${process_FILE}"
if [ $? != 0 ]; then
# The step 2 redo failed, signal it. The file is still here, so a future recovery can handle it , nothing more to do
echo "ERROR: process_recovery, failed to redo step 2 for ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
else
# The file as fully completed step 2 then and should be marked as processed
recover_status=4;
# Need so that the processed part deals with the new filename
file_to_check="${process_FILE}.$$_2"
fi
fi
fi
;;
*)
# Abnormal situation, unknow step, signal it
echo "ERROR: process_recovery, unknown step for ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
;;
esac;
# If the recovery operations were successful, we can now rename the file accordingly
case "${recover_status}" in
"3")
# Rename it 'back' so the file will be processed by the process factory next
mv "${file_to_check}" "${process_FILE}"
if [ $? != 0 ]; then
# The renaming failed, signal it. The file is still here, so a future recovery can handle it , nothing more to do
echo "ERROR: process_recovery, failed to put back the file ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
else
echo "Recovering ${file_to_check}...done, reverted."
fi
;;
"4")
# Rename as already fully processed
mv "${file_to_check}" "${process_FILE}_processed"
if [ $? != 0 ]; then
# The renaming failed, signal it. The file is still here, so a future recovery can handle it , nothing more to do
echo "ERROR: process_recovery, failed to rename the fully processed file ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
else
echo "Recovering ${file_to_check}...done, processed."
fi
;;
esac;
fi
fi
done )
if [ $? != 0 ] ; then
# the recovery processing meets errors, we have to exit with error
recovery_status=1;
fi
# Finished, we can remove the recovery lock, there'll b nop race condition if a second recovery process start now
# We can only have one recovery process at a time, check for the corresponding lock, we use an atomic mkdir for that
rmdir "${process_input_path}/recover" &>/dev/null
if [ $? != 0 ] ; then
echo "ERROR: process_recovery, can't remove the recovery lock, you'l have to manually remove it." 2>&1
recovery_status=1;
fi
# Return status
return ${recovery_status};
}
# Example of an imaginary file processing factory
# this factory will look for all files matching '*.txt' in its input path
# Parameteres: [ -r ]
# Returns : 0 if all matching files in the input path were processed,
# 1 otherwise
# -r : Instead of processing files, launch the recovery process, optional
# stdout : processing log
# stderr : potential processing errors
process_files() {
if [ $# -gt 1 ]; then
echo "ERROR: process_files, wrong parameters" 2>&1
return 1;
fi
if [[ $# = 1 && "$1" = "-r" ]] ; then
# launch the recovery process and exit its exit status
process_recovery
return $?
fi
if [ $# != 0 ] ; then
echo "ERROR: process_files, unknown parametrs : $*" 2>&1
return 1;
fi
# Parameter(s) have been processed, we are now looking for files to process
local process_status=0;
find "${process_input_path}/" -name '*.txt' | ( while read -r file_to_process || exit ${process_status}; do
echo "Processing ${file_to_process}..."
process_file "${file_to_process}"
if [ $? != 0 ] ; then
# Something went wrong, signal it on stderr
echo "Processing ${file_to_process} failed, the file may has been locked by antoher process or may be in the wrong format." 2>&1
# We set the flag for signaling trouble but we continue to process
# the following files
process_status=1;
else
echo "Processing ${file_to_process}...done."
fi
done )
if [ $? != 0 ] ; then
# the factory processing meets errors, we have to exit with error
return 1;
fi
# All matching files were correctly processed or there was no
# matching files to process, we return a success
return 0;
}
# The main entry point
# check that we have paths before anything harmful happend..
if [[ -z "${process_input_path}" || -z "${process_input_etl_path}" ]] ; then
echo "ERROR: $0, configuration missing..." 2>&1
exit 1;
fi
# Before processing any file, we check for /proc
if [ ! -e "/proc/$$" ] ; then
echo "ERROR: $0, /proc is required..." 2>&1
exit 2;
fi
# We force a common identifier for the processing script, process.sh, so recovery can easily check for running process
echo "process.sh" > "/proc/$$/comm"
if [ $? != 0 ] ; then
echo "ERROR: $0, can't set /proc/$$/comm..." 2>&1
exit 3;
fi
process_files "$@"
編寫代碼的方式, if
檢查某個文件是否存在,如果存在則執行 if 的主體。
現在,如果if
的主體依賴於文件存在的事實(由條件確定),那么實際上只有一個競爭條件。 當主體對文件執行某些操作時,通常會出現這種情況,例如:
但是,其中許多操作可以在不首先檢查文件是否存在的情況下執行。 相反,您可以簡單地執行操作並對“找不到文件”錯誤做出反應。
但是,如果您要執行多個此類操作並排除相關文件同時更改的可能性,則變得更加棘手。 這不能通過簡單的if
來完成,因為語義略有不同:
if
like,您正在檢查:該文件是否在此時存在? (條件執行的時間點)你看,前者不能保證后者。 您需要某種鎖定機制。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.