無法讀取文件python

Question

嗨，我有一個包含名為0_data 、 0_index等文件的 tar 文件。我想要做的是打開 tar 文件並通讀這些文件的內容。 到目前為止我能做的是提取所有文件。 我不能做的是讀取單個文件的內容。 我知道它們不是純文本文件，但是如果我看不到文件的內容，我該如何解析一堆網頁的文件？

我嘗試打開文件時遇到的錯誤是：

return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 87: character maps to <undefined>

這是我的代碼：

import os
import tarfile

def is_tarfile(file):
return tarfile.is_tarfile(file)

def extract_tarfile(file):
    if is_tarfile(file):
        my_tarfile=tarfile.open(file)
        my_tarfile.extractall("c:/untar")
        read_files_nz2("c:/untar/nz2_merged");
        return 1
    return 0

def read_files_nz2(file):
    for subdir, dirs, files in os.walk(file):
        for i in files:
             path = os.path.join(subdir,i)
             print(path)
             content=open(path,'r')
             print (content.read())

extract_tarfile("c:/nz2.tar")

print(i)會輸出文件名，但是print(content.read())會報錯：

return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 87: character maps to <undefined>

我希望有人可以幫助我從文件中讀取數據

Answer 1

我不是 100% 確定這是您的問題，但這至少是不好的做法，並且可能是您的問題的根源。

您沒有關閉您打開的任何文件。 例如你有：

my_tarfile=tarfile.open(file)

但是在那之后的某個地方，在你打開另一個文件之前，你應該有：

my_tarfile.close()

這是來自diveintopython的引用：

打開的文件會消耗系統資源，並且根據文件模式的不同，其他程序可能無法訪問它們。 完成文件后立即關閉文件很重要。

我的想法是，因為您從不關閉 my_tarfile，系統無法正確讀取從中提取的文件。 即使這不是問題，最好盡快關閉文件。

Answer 2

您需要一個完整的文件路徑來訪問它，而不僅僅是一個名稱。 你的第二個函數應該是這樣的：

def read_files_nz2(file):
for subdir, dirs, files in os.walk(file):
    for i in files:
        path = os.path.join(subdir, f) # Getting full path to the file
        content=open(path,'r')
        print (content.read())

Answer 3

你需要做兩件事之一：

打開文件時指定編碼：

 # This is probably not the right encoding. content = open(path, 'r', encoding='utf-8')

為此，您需要知道文件的編碼是什么。

以二進制模式打開文件：
```
 content = open(path, 'rb')
```
這將導致 read 返回一個字節對象而不是字符串，但它將避免任何解碼或驗證單個字節的嘗試。

Answer 4

我不確定問題，但這種情況發生在我身上，它使用這種編碼解決了

with open(path, 'r', encoding="ISO-8859-1") as f:
    content = f.read()

另一個好方法是用 UTF-8 重寫你的文件，檢查這個代碼

with open(ff_name, 'rb') as source_file:
  with open(target_file_name, 'w+b') as dest_file:
    contents = source_file.read()
    dest_file.write(contents.decode('utf-16').encode('utf-8'))

無法讀取文件python

問題描述

4 個解決方案

解決方案1
1 2015-03-20 03:00:47

解決方案2
1 2015-03-20 03:02:36

解決方案3
1 已采納 2015-03-20 06:01:30

解決方案4
0 2021-02-28 23:26:57

無法讀取文件python

問題描述

4 個解決方案

解決方案1 1 2015-03-20 03:00:47

解決方案2 1 2015-03-20 03:02:36

解決方案3 1 已采納 2015-03-20 06:01:30

解決方案4 0 2021-02-28 23:26:57

解決方案1
1 2015-03-20 03:00:47

解決方案2
1 2015-03-20 03:02:36

解決方案3
1 已采納 2015-03-20 06:01:30

解決方案4
0 2021-02-28 23:26:57