简体   繁体   English

如何从python中的混合txt文件中读取数字

[英]How to read numbers from a mixed txt file in python

I have a txt file which is composed of text and numbers.我有一个由文本和数字组成的txt文件。 It looks something like this:它看起来像这样:

> this is a paragraph which is introductory which lasts
  some more lines 

text text text

567 45 32 468
974 35 3578 4467
325 765 355 5466

text text text
1 3 6
text text>

What i need is to store the rows which contains 4 number elements.我需要的是存储包含 4 个数字元素的行。

When i use the read command all elements are read and stored as strings.当我使用 read 命令时,所有元素都被读取并存储为字符串。 I'm not sure if i can convert the numbers into digits without filtering them first.我不确定是否可以将数字转换为数字而不先过滤它们。

I would appreciate any help.我将不胜感激任何帮助。 Thanks.谢谢。

Use the splitlines() function.使用 splitlines() 函数。

A=open(your file here,'r').read().splitlines()

This will be a list and now you can extract whatever you need.这将是一个列表,现在您可以提取任何您需要的内容。 Like:喜欢:

Req=[]
for i in A:
    elem = [s.isnumeric() for s in i.split(' ')]
    if len(elem) == 4 and all(elem):
        Req.append(i)

Read file by lines, and analyse them.逐行读取文件,并对其进行分析。 Skip lines with unequal 4 elements and lines that do not consist of 4 space seperated integers:跳过不相等的 4 个元素的行和不包含 4 个空格分隔的整数的行:

results = []
with open (filename) as f:
    for line in f:
        line = line.strip().split()
        if len(line) != 4:
            continue  # line has != 4 elements

        try: 
            numbers = map(int,line)
        except ValueError:
            continue # line is not all numbers

        # do something with line
        results.append(line)  # or append(list(numbers)) to add the integers

print(*results, sep="\n")

prints:印刷:

['567', '45', '32', '468']
['974', '35', '3578', '4467']
['325', '765', '355', '5466']

If you can assume that the rows you need will only have 4 numbers then this solution should work:如果您可以假设您需要的行只有 4 个数字,那么此解决方案应该有效:


nums = []
with open('filename.txt') as f:
    for line in f:
        line = line.split()
        if len(line) == 4 and all([c.isdigit() for c in line]):
            # use [float(c) for c in line] if needed
            nums.append([int(c) for c in line])

print(nums)

For me it sounds like task for re module.对我来说,这听起来像是re模块的任务。 I would do:我会做:

import re
with open('yourfile.txt', 'r') as f:
    txt = f.read()
lines_w_4_numbers = re.findall(r'^\d+\s\d+\s\d+\s\d+$', txt, re.M)
print(lines_w_4_numbers)

Output:输出:

['567 45 32 468', '974 35 3578 4467', '325 765 355 5466']

Explanation: re.M flag mean ^ and $ will match start/end of line, \\s denotes whitespace, \\d+ denotes 1 or more digits.说明: re.M标志表示^$将匹配行首/行尾, \\s表示空格, \\d+表示 1 个或多个数字。

So you're looking for a substring that contains exactly four integers seperated by space and ended with a newline.所以,你要寻找的是包含一个子四个整数分隔的空间,并以换行符结束。 You can use regular expressions to locate substrings that follows this pattern.您可以使用正则表达式来定位遵循此模式的子字符串。 Say you stored the string in the variable s :假设您将字符串存储在变量s

import re
matches = [m[0] for m in re.findall(r"((\d+\s){4})", s)]

The matches variable now contains the strings with exactly four integers in them. matches变量现在包含正好包含四个整数的字符串。 Afterwards you can split each string and convert to integers if you want:之后,如果需要,您可以拆分每个字符串并转换为整数:

matches = [[int(i) for i in s.split(' ')] for s in matches]

Result:结果:

[[567, 45, 32, 468], [974, 35, 3578, 4467], [325, 765, 355, 5466]]

If you know how to use python regex module you can do that:如果您知道如何使用 python regex 模块,您可以这样做:

import re

if __name__ == '__main__':

    with open(TEST_FILE, 'r') as file_1:
        for line in file_1.readlines():

            if re.match(r'(\d+\s){4}', line): 
                line = line.strip() # remove \n character
                print(line) # just lines with four numbers are printed

The result for you file example is:您的文件示例的结果是:

567 45 32 468
974 35 3578 4467
325 765 355 5466

Using regular expression here will be most powerful.在这里使用正则表达式将是最强大的。 We create an pattern using re.compile and then we use search or match method to match the pattern in the string.我们使用 re.compile 创建一个模式,然后我们使用 search 或 match 方法来匹配字符串中的模式。

import re

p = re.compile(r'[\d]{4}') # \d matches for single digit and {4} will look for 4 continuous occurrences.
file = open('data.txt', 'r') # Opening the file
line_with_digits = [] 
for line in file:  # reading file line by line
    if p.search(line): # searching for pattern in line
        line_with_digits.append(line.strip())  # if pattern matches adding to list

print(line_with_digits) 

The input file for above program is:上述程序的输入文件是:

text text text

567 45 32 468
974 35 3578 4467
325 765 355 5466

text text text
1 3 6
text text

text  5566 text 45 text
text text 564 text 458 25 text

The output is:输出是:

['974 35 3578 4467', '325 765 355 5466', 'text  5566 text 45 text']

Hope this helps.希望这可以帮助。

you can use a regular expression:您可以使用正则表达式:

import re

result = []
with open('file_name.txt') as fp:
    for line in fp.readlines():
        if re.search(r'\d{4}', line):
            result.append(line.strip())

print(result)

output:输出:

['974 35 3578 4467', '325 765 355 5466']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM