打開並讀取以空格分隔的txt文件

Question

我有一個用空格分隔的txt文件，如下所示：

2004          Temperature for KATHMANDU AIRPORT       
        Tmax  Tmin
     1  18.8   2.4 
     2  19.0   1.1 
     3  18.3   1.7 
     4  18.3   1.0 
     5  17.8   1.3

我想分別計算Tmax和Tmin的平均值。 但是，我很難讀取txt文件。 我嘗試了這樣的鏈接。

import re
list_b = []
list_d = []

with open('TA103019.95.txt', 'r') as f:
    for line in f:
        list_line = re.findall(r"[\d.\d+']+", line)
        list_b.append(float(list_line[1])) #appends second column
        list_d.append(float(list_line[3])) #appends fourth column

print list_b
print list_d

但是，這給了我一個錯誤： IndexError: list index out of range這里有什么問題？

Answer 1

一種簡單的解決方法是使用split()函數。 當然，您需要刪除前兩行：

with io.open("path/to/file.txt", mode="r", encoding="utf-8") as f:
    next(f)
    next(f)
    for line in f:
        print(line.split())

你得到：

['1', '18.8', '2.4']
['2', '19.0', '1.1']
['3', '18.3', '1.7']
['4', '18.3', '1.0']
['5', '17.8', '1.3']

引用文檔：

如果未指定sep或為None ，則將應用不同的拆分算法：連續的空白行將被視為單個分隔符，並且如果字符串具有前導或尾隨空格，則結果在開頭或結尾將不包含空字符串。

Answer 2

如前所述這里， re.findall名單正則表達式的所有比賽。 您定義的表達式與文件中的任何內容都不匹配，因此會得到一個空數組，從而在嘗試訪問list_line[1]時導致錯誤。

您要基於該文件匹配的表達式將為r"\\d+\\.\\d+" ，匹配任何十進制數字，該數字必須與小數點前至少一位數字，該小數點后至少一位數字相同
即使此表達式在前兩行中都不匹配，所以您將需要檢查空數組
結果不知道任何列，只是模式的匹配，並且每條數據線將有兩個匹配-您將要歸一化0和1

所以：import re list_b = [] list_d = []

with open('TA103019.95.txt', 'r') as f:
    for line in f:
        list_line = re.findall(r'\d+\.\d+', line)
        if len(list_line) == 2 :
            list_b.append(float(list_line[0])) #appends second column
            list_d.append(float(list_line[1])) #appends fourth column

print list_b
print list_d

Answer 3

import re
list_b = []
list_d = []

with open('TA103019.95.txt', 'r') as f:
    for line in f:
        # regex is corrected to match the decimal values only
        list_line = re.findall(r"\d+\.\d+", line) 

        # error condition handled where the values are not found 
        if len(list_line) < 2: 
            continue

        # indexes are corrected below
        list_b.append(float(list_line[0])) #appends second column
        list_d.append(float(list_line[1])) #appends fourth column

print list_b
print list_d

我在代碼本身中添加了一些注釋並添加了答案。

您收到的Index out of range error是因為list_line僅具有一個元素（即文件的第一行中為2004），並且您試圖訪問list_line的第一個索引和第三個索引。

Answer 4

完整解決方案

def readit(file_name,start_line = 2): # start_line - where your data starts (2 line mean 3rd line, because we start from 0th line) 
    with open(file_name,'r') as f:
        data = f.read().split('\n')
    data = [i.split(' ') for i in data[start_line:]]
    for i in range(len(data)):
        row = [(sub) for sub in data[i] if len(sub)!=0]
        yield int(row[0]),float(row[1]),float(row[2])


iterator = readit('TA103019.95.txt')


index, tmax, tmin = zip(*iterator)


mean_Tmax = sum(tmax)/len(tmax)
mean_Tmin = sum(tmin)/len(tmin)
print('Mean Tmax: ',mean_Tmax)
print('Mean Tmnin: ',mean_Tmin)

>>> ('Mean Tmax: ', 18.439999999999998)
>>> ('Mean Tmnin: ', 1.5)

感謝Dan D.提供更優雅的解決方案

Answer 5

簡化您的生活，避免再次遇到這個問題。

也許您誤讀了標題行？ 如果文件的格式是固定的，我通常在開始循環之前先用讀取的行“燒寫”標題行，例如：

with open(file_name, 'r') as f:
    f.readline()  # burn the header row
    for line in f:
        tokens = line.strip().split(' ')   # tokenize the row based on spaces

然后，您將獲得一個令牌列表，這些令牌將是您需要轉換為int或float或從那里開始的字符串！

輸入幾個打印語句，以查看您要提取的內容...

Answer 6

您的文件是否可能用制表符分隔？

對於制表符分隔：

with open('TA103019.95.txt', 'r') as f:
    for idx, line in enumerate(f):
        if idx > 1:                    
            cols = line.split('\t'): #for space delimited change '\t' to ' '
            tmax = float(col[1])
            tmin = float(col[2])
            #calc mean

            mean = (tmax + tmin) / 2
            #not sure what you want to do with the result

打開並讀取以空格分隔的txt文件

問題描述

6 個解決方案

解決方案1
2 已采納 2019-02-03 20:58:50

解決方案2
1 2019-02-03 21:03:47

解決方案3
1 2019-02-03 21:05:50

解決方案4
1 2019-02-03 21:07:44

解決方案5
0 2019-02-03 20:56:58

解決方案6
0 2019-02-03 21:02:22

打開並讀取以空格分隔的txt文件

問題描述

6 個解決方案

解決方案1 2 已采納 2019-02-03 20:58:50

解決方案2 1 2019-02-03 21:03:47

解決方案3 1 2019-02-03 21:05:50

解決方案4 1 2019-02-03 21:07:44

解決方案5 0 2019-02-03 20:56:58

解決方案6 0 2019-02-03 21:02:22

解決方案1
2 已采納 2019-02-03 20:58:50

解決方案2
1 2019-02-03 21:03:47

解決方案3
1 2019-02-03 21:05:50

解決方案4
1 2019-02-03 21:07:44

解決方案5
0 2019-02-03 20:56:58

解決方案6
0 2019-02-03 21:02:22