簡體   English   中英

如何檢查文件是否存在而不在 bash 腳本中創建競爭條件?

[英]How to check if a file exists without creating a race condition in a bash script?

如果我錯了,請糾正我,但我所知道的以及我對競爭條件和 TOCTOU(檢查時間和使用時間)錯誤的理解,以這種方式檢查文件是否存在:

if [ -f /path/to/file ]; then 
    #File exists do some operations on it
fi

創建競爭條件和 TOCTOU 錯誤。 那么有沒有其他方法可以在不創建競爭條件的情況下檢查文件或目錄是否存在,或者如果文件不存在則嘗試打開文件並處理錯誤。

我知道在大多數腳本中使用以前的方法可能不是那么重要,但對我來說,最好練習避免這種情況。
謝謝你的幫助。

為避免競爭條件,您可以將文件重命名為第一步鎖定條件 在許多文件系統上,這是一個不能同時完成的“原子”操作(一次 inode 寫入)。

這樣,如果重命名成功,您可以確定該文件存在並且您的其他進程都沒有使用它的原始名稱。

例如,使用當前進程 PID 重命名文件:

mv /path/to/file path/to/file.$$
if [ $? = 0 ] ; then
  # Success, we can work on path/to/file.$$, and we're then the only one to do so from 
  # our processes point of view.
  cat path/to/file.$$ # doing something with the file
  # At the end, we can rename/move the file as 'processed'
  mv path/to/file.$$ processed_path/to/file
fi

這樣,您還可以對帶有 PID 號作為擴展名的文件進行恢復過程。

編輯:正如@Thomas 所主張的,這里是這個解決方案的基本實現,作為 bash 腳本, process 除非在目錄樹中,例如:

[ `process` current directory ]
|-->[input] input directory where the script look for '*.txt' files to process
|-->[input_path_etl] input directory where the script will place processed file for ETL

該腳本需要/proc文件系統進行簡單的進程檢查。 對於垂直可讀性, SC2181尚未應用。

該腳本使用./process處理文件,並且可以在崩潰時使用./process -r from its current path進行恢復。 這只是一個例子來說明如何使用 mv lock。 此處對 .txt 文件的處理是將文件中的數據虛構加載到數據庫中的第一步,以及為 ETL 處理器生成虛構文件的第二步。

#!/bin/bash
# process factory paths, should read from a config file, LDAP source, wathever...
process_input_path="input"
process_input_etl_path="input_path_etl"
# Example of an imaginary process that load stdin into a db
load_into_db() {
  return 0;
}
# Example of an imaginary process that clean the data in the db for recovery
# Parameters : { filename }
# Returns: 0 successful recover, 1 otherwise
# filename: file path and name which require cleaning in db, mandatory
# stderr: potential cleaning erros
clean_db() {
  if [ $# != 1 ] ; then
    echo "ERROR: clean_db, wrong parameters" 2>&1
    return 1;
  fi
  return 0;
}
# Example of an imaginary process that load a file into a db
# Parameters: { filename }
# returns: 0 if successfull, 1 if failed
# filename: file's path and name of the file to process, mandatory
# stderr: potential processing errors
process_first_step() {
  if [ $# != 1 ] ; then
    echo "ERROR: process_first_step wrong parameters" 2>&1
    return 1;
  fi
  # first example step, load things from the file into a db
  cat "$1.$$_1" | load_into_db
  if [ $? = 0 ] ; then 
    # rename first the file to means step 1 was succesfully done and 
    # we go for the second
    mv "$1.$$_1" "$1.$$_2"
    if [ $? = 0 ] ; then 
      # success, the file is ready for step 2
      return 0;
    fi
  fi
  # If we're here, something went wrong in step 1, exiting with error
  return 1;
}
# Example of an imaginary process that put a file into the input path of an ETL
# Parameters: { filename }
# returns: 0 if successfull, 1 if failed
# filename: file's path and name of the file to process, mandatory
# stderr: potential processing errors
process_second_step() {
  if [ $# != 1 ] ; then
    echo "ERROR: process_second_step wrong parameters" 2>&1
    return 1;
  fi
  # the file is ready for step 2, we create the appropriate input
  # for the ETL with some sed transfomration beforehand
  cat "$1.$$_2" | sed 's/line/lInE/g' > "$1.$$_2.etl"
  if [ $? = 0 ] ; then
    # Success, the file is ready for the ETL factory process,
    # we move it in with an atomic mv to make it visible from 
    # the ETL factory process
    mv "$1.$$_2.etl" "${process_input_etl_path}/"
    if [ $? = 0 ] ; then
      # Successful, step 2 is done
      return 0;
    fi
  fi
  # If we're here, something went wrong in step 2, exiting with error
  return 1;
}
# Example of an imaginary file processor that conducts all the
# required step on the provided file
# Parameters : { filename }
# Returns : 0 if successful, 1 otherwise
# filename : file's path and name of the file to process, mandatory
# stderr: potential processing errors
process_file() {
  if [ $# != 1 ] ; then
    echo "ERROR: process_file, wrong parameters" 2>&1
    return 1;
  fi
  # Lock the file for processing step one
  mv "$1" "$1.$$_1" 
  if [ $? = 0 ] ; then
    # ok we have the file for us
    # first example step, load things from the file into a db
    process_first_step "$1"
    if [ $? = 0 ] ; then
      # first step is successful, so continue the process
      # next example step, add the loaded lines with transformations into the input path of antoher process factory (like an ETL)
      process_second_step "$1"
      if [ $? = 0 ] ; then
        # Second step is susccesful, we can now rename the file  with
        # a suffix meaning it was fully processed, a filename that would
        # not be visible for the factory process
        mv "$1.$$_2" "$1_processed"
        if [ $? != 0 ] ; then
          # if this failed, we have to return an error,
          # the current file name would be $1.$$_2, not visible
          # from the process factory and the error message will mean
          # that the file was fully processed but can't be renamed
          # at the end, so no recovering is required
          echo "ERROR: process_file, $1 can't be renamed as fully processed." 2>&1
          return 1;
        fi
        # if we're here, the file was fully processed and rename accordingly,
        # we return a success status
        return 0;
      fi
    fi
  fi
  # If we're here, something went wrong in the process, we exit with an error
  # the actual filename will be $1.$$_1 or $1.$$_2 depending of where it was
  # in the processing chain, it will not be visible from the main 
  # process factory and the rcovery process can then process it accordingly
  return 1;
}
# Example of an imaginary process recovery for orphan files due to a crash, 
# power outage, unexpected reboot, CTRL^C, etc.
# Returns: 0 for success, 1 if error(s)
# stdout: recovery operations infos if any
# stderr: potential error(s)
process_recovery() {
  if [ $# != 0 ] ; then
    echo "ERROR: process_recovery, wrong parameters." 2>&1
    return 1;
  fi
  # local variables
  local process_FILE=""
  local process_PID=""
  local process_STEP=""
  local process_CMD=""
  # flag for the file that means :
  #     0 : do not recover
  #     1 : recover
  #     2 : can't recover
  #     3 : recover successful, rename to put the file back in the process
  #     4 : recover successful, rename it as fully processed
  local recover_status=0
  # flag to check if the recover process is succesful, 
  #     0: success
  #     1: error(s)
  local recovery_status=0
  # We can only have one recovery process at a time, check for the corresponding lock, we use an atomic mkdir for that 
  mkdir "${process_input_path}/recover" &>/dev/null
  if [ $? != 0 ] ; then
    # if it fails, it means there is probably already a running recover
    echo "ERROR: process_recovery, a recovery seems to be still in progress." 2>&1
    echo "                         if there is no more running recovery (crash)," 2>&1
    echo "                        disarm manually the lock by removing the recover folder." 2>&1
    echo "                        Check also that the input folder is writable for script." 2>&1
    return 1;
  fi
  # We first have to check every files in the input path that match 
  # a *.txt.<PID>_<step> pattern
  find "${process_input_path}/" -name '*.txt.[0-9]*_[12]' | ( while read -r file_to_check || exit ${recovery_status};  do 
    # By default, do not recover
    recover_status=0
    # Get the PID and check if there is a running corresponding process
    process_PID="$(echo "${file_to_check}" | sed 's/^.*\.txt\.\([^_]*\)_[0-9]*$/\1/')"
    if [[ $? != 0 || "${process_PID}" = "${file_to_check}" ]] ; then
      # Something went wrong, we output an error on stderr and set the flag
      echo "ERROR: process_recovery, failed to parse pid from file name ${file_to_check}" 2>&1
      recovery_status=1;
      recover_status=2;
    else 
      # We check the shell process through /proc and check it is our
      process_CMD="$(cat "/proc/${process_PID}/comm" 2>/dev/null)"
      if [[ $? = 0 && "$(echo "${process_CMD}" | grep process.sh)" != "" ]] ; then
        # There is a process.sh with the same PID, no recover needed
        echo "File ${file_to_check} is processed by PID ${process_PID}..."
      else
        # There is no corresponding process, but it could have finished during
        # our operations, so we check if the file is still here
        if [ -e "${file_to_check}" ] ; then
          # The file is still here, so we need to recover
          echo "XX${process_CMD}"
          recover_status=1;
        fi
      fi
    fi
    if [ "${recover_status}" = "1" ] ; then
      # The file should be recovered, signal it
      echo "Recovering file ${file_to_check}..."
      # Get the original file name
      process_FILE="$(echo "${file_to_check}" | sed 's/^\(.*\.txt\)\.[^_]*_[0-9]*$/\1/')"
      if [[ $? != 0 || "${process_FILE}" = "${file_to_check}" ]] ; then
        # Something went wrong, we output an error on stderr and set the flag
        echo "ERROR: process_recovery, failed to parse original name from file name ${file_to_check}" 2>&1
        recovery_status=1;
        recover_status=2;
      else
        # We need to know at which step it was 
        process_STEP="$(echo "${file_to_check}" | sed 's/^.*\.txt\.[^_]*_\([0-9]*\)$/\1/')"
        if [[ $? != 0 || "${process_STEP}" = "${file_to_check}" ]] ; then
          # Something went wrong, we output an error on stderr and set the flag
          echo "ERROR: process_recovery, failed to parse step from file name ${file_to_check}" 2>&1
          recovery_status=1;
          recover_status=2;
        fi
      fi
      # Still ok to recover ?
      if [ "${recover_status}" = "1" ] ; then
        # check the step
        case "${process_STEP}" in
            "1")
              # Do database cleaning for the file, we will revert and rename the file 
              # so it will be processed next by the factory process
              clean_db "${file_to_check}"
              if [ $? != 0 ]; then
                # The cleaning process has failed, signal it
                echo "ERROR: process_recovery, failed to clean the db for ${file_to_check}" 2>&1
                recovery_status=1;
                recover_status=2;
              else
                # Cleaning was successful, rename the file so it will be 
                # visible at new from the process factory
                recover_status=3;
              fi
              ;;
            "2")
              # If the file is still here, check if it is not in the input path of the ETL 
              # or if the ETL is/has already processing/processed it
              if [[ -e "${process_input_etl_path}/${process_FILE}.etl" || -e "${process_input_etl_path}/${process_FILE}.etl_processed" ]] ; then
                # The file as fully completed step 2 then and should be marked as processed
                recover_status=4;
              else 
                # If the file has not reach the ETL input path, we just have to launch step 2 for the file
                # If there is .etl local file, we aren't sure it was completed before crash, so a redo of step will simply overwrite it,
                # as it is a local file in the current path, it has never been seen by the ETL 
                # We rename it for processing with the recovery PID
                echo "Recovering ${file_to_check} on step 2 as ${process_FILE}.$$_2..."
                mv "${file_to_check}" "${process_FILE}.$$_2"
                if [ $? != 0 ]; then
                  # The renaming failed, signal it. The file is still here, so a future recovery can handle it , nothing more to do 
                  echo "ERROR: process_recovery, failed to rename file ${file_to_check} for step 2" 2>&1
                  recovery_status=1;
                  recover_status=2;
                else
                  # File is ready for step 2
                  process_second_step "${process_FILE}"
                  if [ $? != 0 ]; then
                    # The step 2 redo failed, signal it. The file is still here, so a future recovery can handle it , nothing more to do 
                    echo "ERROR: process_recovery, failed to redo step 2 for ${file_to_check}" 2>&1
                    recovery_status=1;
                    recover_status=2;
                  else
                    # The file as fully completed step 2 then and should be marked as processed
                    recover_status=4;
                    # Need so that the processed part deals with the new filename
                    file_to_check="${process_FILE}.$$_2"
                  fi
                fi
              fi
              ;;
            *)
               # Abnormal situation, unknow step, signal it
                echo "ERROR: process_recovery, unknown step for ${file_to_check}" 2>&1
                recovery_status=1;
                recover_status=2;
                ;;
        esac;
        # If the recovery operations were successful, we can now rename the file accordingly
        case "${recover_status}" in
          "3")
            # Rename it 'back' so the file will be processed by the process factory next
            mv "${file_to_check}" "${process_FILE}"
            if [ $? != 0 ]; then
              # The renaming failed, signal it. The file is still here, so a future recovery can handle it , nothing more to do 
              echo "ERROR: process_recovery, failed to put back the file ${file_to_check}" 2>&1
              recovery_status=1;
              recover_status=2;
            else
              echo "Recovering ${file_to_check}...done, reverted."
            fi
            ;;
          "4")
            # Rename as already fully processed 
            mv "${file_to_check}" "${process_FILE}_processed"
            if [ $? != 0 ]; then
              # The renaming failed, signal it. The file is still here, so a future recovery can handle it , nothing more to do 
              echo "ERROR: process_recovery, failed to rename the fully processed file ${file_to_check}" 2>&1
              recovery_status=1;
              recover_status=2;
            else
              echo "Recovering ${file_to_check}...done, processed."
            fi
            ;;
        esac;
      fi
    fi
  done )
  if [ $? != 0 ] ; then
    # the recovery processing meets errors, we have to exit with error
    recovery_status=1;
  fi
  # Finished, we can remove the recovery lock, there'll b nop race condition if a second recovery process start now
  # We can only have one recovery process at a time, check for the corresponding lock, we use an atomic mkdir for that 
  rmdir "${process_input_path}/recover" &>/dev/null
  if [ $? != 0 ] ; then
    echo "ERROR: process_recovery, can't remove the recovery lock, you'l have to manually remove it." 2>&1
    recovery_status=1;
  fi
  # Return status
  return ${recovery_status};
}
# Example of an imaginary file processing factory
# this factory will look for all files matching '*.txt' in its input path
# Parameteres: [ -r ]
# Returns : 0 if all matching files in the input path were processed, 
#           1 otherwise
# -r : Instead of processing files, launch the recovery process, optional
# stdout : processing log
# stderr : potential processing errors
process_files() {
  if [ $# -gt 1 ]; then
    echo "ERROR: process_files, wrong parameters" 2>&1
    return 1;
  fi
  if [[ $# = 1 && "$1" = "-r" ]] ; then
    # launch the recovery process and exit its exit status
    process_recovery
    return $?
  fi
  if [ $# != 0 ] ; then
    echo "ERROR: process_files, unknown parametrs : $*" 2>&1
    return 1;
  fi
  # Parameter(s) have been processed, we are now looking for files to process
  local process_status=0;
  find "${process_input_path}/" -name '*.txt' | ( while read -r file_to_process || exit ${process_status}; do
    echo "Processing ${file_to_process}..."
    process_file "${file_to_process}"
    if [ $? != 0 ] ; then
      # Something went wrong, signal it on stderr
      echo "Processing ${file_to_process} failed, the file may has been locked by antoher process or may be in the wrong format." 2>&1
      # We set the flag for signaling trouble but we continue to process 
      # the following files
      process_status=1; 
    else
      echo "Processing ${file_to_process}...done."
    fi
  done ) 
  if [ $? != 0 ] ; then
    # the factory processing meets errors, we have to exit with error
    return 1;
  fi
  # All matching files were correctly processed or there was no
  # matching files to process, we return a success
  return 0;
}
# The main entry point
# check that we have paths before anything harmful happend..
if [[ -z "${process_input_path}" || -z "${process_input_etl_path}" ]] ; then
  echo "ERROR: $0, configuration missing..." 2>&1
  exit 1;
fi
# Before processing any file, we check for /proc
if [ ! -e "/proc/$$" ] ; then
  echo "ERROR: $0, /proc is required..." 2>&1
  exit 2;
fi
# We force a common identifier for the processing script, process.sh, so recovery can easily check for running process
echo "process.sh" > "/proc/$$/comm"
if [ $? != 0 ] ; then
  echo "ERROR: $0, can't set /proc/$$/comm..." 2>&1
  exit 3;
fi
process_files "$@"

編寫代碼的方式, if檢查某個文件是否存在,如果存在則執行 if 的主體。

現在,如果if的主體依賴於文件存在的事實(由條件確定),那么實際上只有一個競爭條件。 當主體對文件執行某些操作時,通常會出現這種情況,例如:

  • 打開文件進行讀取/附加/寫入(如:僅在文件存在時寫入文件)
  • 移動或復制文件
  • 查詢文件的一些屬性(大小、上次修改時間等)

但是,其中許多操作可以在不首先檢查文件是否存在的情況下執行。 相反,您可以簡單地執行操作並對“找不到文件”錯誤做出反應。

但是,如果您要執行多個此類操作並排除相關文件同時更改的可能性,則變得更加棘手。 這不能通過簡單的if來完成,因為語義略有不同:

  • 在您的問題中寫下if like,您正在檢查:該文件是否在此時存在? (條件執行的時間點)
  • 在沒有競爭條件的情況下成功運行整個身體所需的是檢查,例如文件是否在身體的整個持續時間內存在(並且保持不變)?

你看,前者不能保證后者。 您需要某種鎖定機制。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM