Python（3.5）-構造字符串以保存文件-字符串包含轉義符

Question

我正在使用Python（3.5）遍歷一些.msg文件，從其中提取數據，其中包含下載文件的url和文件應放入的文件夾。 我已經成功地從.msg文件中提取了數據，但是現在當我嘗試拼湊下載文件的絕對文件路徑時，該格式最終會變得很奇怪，並帶有反斜杠和\\ t \\ r。

這是代碼的簡短視圖：

for file in files:
    file_abs_path = script_dir + '/' + file
    print(file_abs_path)

    outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
    msg = outlook.OpenSharedItem(file_abs_path)

    pattern = re.compile(r'(?:^|(?<=\n))[^:<\n]*[:<]\s*([^>\n]*)', flags=re.DOTALL)
    results = pattern.findall(msg.Body)

    # results[0] -> eventID
    regexID = re.compile(r'^[^\/\s]*', flags=re.DOTALL)
    filtered = regexID.findall(results[0])
    eventID = filtered[0]
    # print(eventID)

    # results[1] -> title
    title = results[1].translate(str.maketrans('','',string.punctuation)).replace(' ', '_') #results[1]
    title = unicodedata.normalize('NFKD', title).encode('ascii', 'ignore')
    title = title.decode('UTF-8')
    #results[1]
    print(title)

    # results[2] -> account
    regexAcc = re.compile(r'^[^\(\s]*', flags=re.DOTALL)
    filtered = regexAcc.findall(results[2])
    account = filtered[0]
    account = unicodedata.normalize('NFKD', account).encode('ascii', 'ignore')
    account = account.decode('UTF-8')
    # print(account)

    # results[3] -> downloadURL
    downloadURL = results[3]
    # print(downloadURL)
    rel_path = account + '/' + eventID + '_' + title + '.mp4'
    rel_path = unicodedata.normalize('NFKD', rel_path).encode('ascii', 'ignore')
    rel_path = rel_path.decode('UTF-8')
    filename_abs_path = os.path.join(script_dir, rel_path)
    # Download .mp4 from a url and save it locally under `file_name`:
    with urllib.request.urlopen(downloadURL) as response, open(filename_abs_path, 'wb') as out_file:
        shutil.copyfileobj(response, out_file)

    # print item [ID - Title] when done
    print('[Complete] ' + eventID + ' - ' + title)

    del outlook, msg

如您所見，我有一些正則表達式可從.msg中提取4條數據。 然后，我必須仔細檢查每一個，並做一些進一步的微調，但要滿足以下條件：

eventID
# 123456

title 
# Name_of_item_with_underscord_no_punctuation 

account
# nameofaccount

downloadURL
# http://download.com/basicurlandfile.mp4

這就是我得到的數據，我已經將它print()了，它沒有任何奇怪的字符。 但是，當我嘗試構造.mp4的路徑（文件名和目錄）時：

downloadURL = results[3]
# print(downloadURL)
rel_path = account + '/' + eventID + '_' + title + '.mp4'
rel_path = unicodedata.normalize('NFKD', rel_path).encode('ascii', 'ignore')
rel_path = rel_path.decode('UTF-8')
filename_abs_path = os.path.join(script_dir, rel_path)
# Download .mp4 from a url and save it locally under `file_name`:
with urllib.request.urlopen(downloadURL) as response, open(filename_abs_path, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

完成此操作后，我從運行代碼得到的輸出是：

Traceback (most recent call last): File "sfaScript.py", line 65, in <module> with urllib.request.urlopen(downloadURL) as response, open(filename_abs_path, 'wb') as out_file: OSError: [Errno 22] Invalid argument: 'C:/Users/Kenny/Desktop/sfa_kenny_batch_1\\\\accountnamehere/123456_Name_of_item_with_underscord_no_punctuation\\t\\r.mp4'

TL; DR-問題

因此， filename_abs_path以某種方式更改為C:/Users/Kenny/Desktop/sfa_kenny_batch_1\\\\accountnamehere/123456_Name_of_item_with_underscord_no_punctuation\\t\\r.mp4

我需要它

C:/Users/Kenny/Desktop/sfa_kenny_batch_1/accountnamehere/123456_Name_of_item_with_underscord_no_punctuation.mp4

感謝您提供的任何幫助！

Answer 1

看起來您的正則表達式在title捕獲了制表符（ \\t ）和換行符（ \\r ）

一個快速解決方案是：

title  = title.strip()

（在編寫文件名之前）

刪除所有“空白”字符，包括表格和回車符。

Python（3.5）-構造字符串以保存文件-字符串包含轉義符

問題描述

1 個解決方案

解決方案1
1 已采納 2016-12-12 16:47:45

Python（3.5）-構造字符串以保存文件-字符串包含轉義符

問題描述

1 個解決方案

解決方案1 1 已采納 2016-12-12 16:47:45

解決方案1
1 已采納 2016-12-12 16:47:45