如何從python中的混合txt文件中讀取數字

Question

我有一個由文本和數字組成的txt文件。 它看起來像這樣：

> this is a paragraph which is introductory which lasts
  some more lines 

text text text

567 45 32 468
974 35 3578 4467
325 765 355 5466

text text text
1 3 6
text text>

我需要的是存儲包含 4 個數字元素的行。

當我使用 read 命令時，所有元素都被讀取並存儲為字符串。 我不確定是否可以將數字轉換為數字而不先過濾它們。

我將不勝感激任何幫助。 謝謝。

Answer 1

使用 splitlines() 函數。

A=open(your file here,'r').read().splitlines()

這將是一個列表，現在您可以提取任何您需要的內容。 喜歡：

Req=[]
for i in A:
    elem = [s.isnumeric() for s in i.split(' ')]
    if len(elem) == 4 and all(elem):
        Req.append(i)

Answer 2

逐行讀取文件，並對其進行分析。 跳過不相等的 4 個元素的行和不包含 4 個空格分隔的整數的行：

results = []
with open (filename) as f:
    for line in f:
        line = line.strip().split()
        if len(line) != 4:
            continue  # line has != 4 elements

        try: 
            numbers = map(int,line)
        except ValueError:
            continue # line is not all numbers

        # do something with line
        results.append(line)  # or append(list(numbers)) to add the integers

print(*results, sep="\n")

印刷：

['567', '45', '32', '468']
['974', '35', '3578', '4467']
['325', '765', '355', '5466']

Answer 3

如果您可以假設您需要的行只有 4 個數字，那么此解決方案應該有效：


nums = []
with open('filename.txt') as f:
    for line in f:
        line = line.split()
        if len(line) == 4 and all([c.isdigit() for c in line]):
            # use [float(c) for c in line] if needed
            nums.append([int(c) for c in line])

print(nums)

Answer 4

對我來說，這聽起來像是re模塊的任務。 我會做：

import re
with open('yourfile.txt', 'r') as f:
    txt = f.read()
lines_w_4_numbers = re.findall(r'^\d+\s\d+\s\d+\s\d+$', txt, re.M)
print(lines_w_4_numbers)

輸出：

['567 45 32 468', '974 35 3578 4467', '325 765 355 5466']

說明： re.M標志表示^和$將匹配行首/行尾， \\s表示空格， \\d+表示 1 個或多個數字。

Answer 5

所以，你要尋找的是只包含一個子四個整數分隔的空間，並以換行符結束。 您可以使用正則表達式來定位遵循此模式的子字符串。 假設您將字符串存儲在變量s ：

import re
matches = [m[0] for m in re.findall(r"((\d+\s){4})", s)]

matches變量現在包含正好包含四個整數的字符串。 之后，如果需要，您可以拆分每個字符串並轉換為整數：

matches = [[int(i) for i in s.split(' ')] for s in matches]

結果：

[[567, 45, 32, 468], [974, 35, 3578, 4467], [325, 765, 355, 5466]]

Answer 6

如果您知道如何使用 python regex 模塊，您可以這樣做：

import re

if __name__ == '__main__':

    with open(TEST_FILE, 'r') as file_1:
        for line in file_1.readlines():

            if re.match(r'(\d+\s){4}', line): 
                line = line.strip() # remove \n character
                print(line) # just lines with four numbers are printed

您的文件示例的結果是：

567 45 32 468
974 35 3578 4467
325 765 355 5466

Answer 7

在這里使用正則表達式將是最強大的。 我們使用 re.compile 創建一個模式，然后我們使用 search 或 match 方法來匹配字符串中的模式。

import re

p = re.compile(r'[\d]{4}') # \d matches for single digit and {4} will look for 4 continuous occurrences.
file = open('data.txt', 'r') # Opening the file
line_with_digits = [] 
for line in file:  # reading file line by line
    if p.search(line): # searching for pattern in line
        line_with_digits.append(line.strip())  # if pattern matches adding to list

print(line_with_digits)

上述程序的輸入文件是：

text text text

567 45 32 468
974 35 3578 4467
325 765 355 5466

text text text
1 3 6
text text

text  5566 text 45 text
text text 564 text 458 25 text

輸出是：

['974 35 3578 4467', '325 765 355 5466', 'text  5566 text 45 text']

希望這可以幫助。

Answer 8

您可以使用正則表達式：

import re

result = []
with open('file_name.txt') as fp:
    for line in fp.readlines():
        if re.search(r'\d{4}', line):
            result.append(line.strip())

print(result)

輸出：

['974 35 3578 4467', '325 765 355 5466']

如何從python中的混合txt文件中讀取數字

問題描述

8 個解決方案

解決方案1
1 2020-03-27 07:06:31

解決方案2
0 2020-03-27 07:05:52

解決方案3
0 2020-03-27 07:12:50

解決方案4
0 2020-03-27 07:13:15

解決方案5
0 2020-03-27 07:14:06

解決方案6
0 2020-03-27 07:16:00

解決方案7
0 2020-03-27 07:22:12

解決方案8
0 2020-03-27 08:21:41

如何從python中的混合txt文件中讀取數字

問題描述

8 個解決方案

解決方案1 1 2020-03-27 07:06:31

解決方案2 0 2020-03-27 07:05:52

解決方案3 0 2020-03-27 07:12:50

解決方案4 0 2020-03-27 07:13:15

解決方案5 0 2020-03-27 07:14:06

解決方案6 0 2020-03-27 07:16:00

解決方案7 0 2020-03-27 07:22:12

解決方案8 0 2020-03-27 08:21:41

解決方案1
1 2020-03-27 07:06:31

解決方案2
0 2020-03-27 07:05:52

解決方案3
0 2020-03-27 07:12:50

解決方案4
0 2020-03-27 07:13:15

解決方案5
0 2020-03-27 07:14:06

解決方案6
0 2020-03-27 07:16:00

解決方案7
0 2020-03-27 07:22:12

解決方案8
0 2020-03-27 08:21:41