简体   繁体   English

如何逐行读取文件并仅在python中打印具有特定字符串的行?

[英]How do I read a file line by line and print the line that have specific string only in python?

I have a text file containing these lines 我有一个包含这些行的文本文件

wbwubddwo 7::a number1 234 **
/// 45daa;: number2 12

time 3:44

I am trying to print for example if the program find string number1 , it will print 234 我正在尝试打印例如程序是否找到字符串number1 ,它将打印234

I start with simple script below but it did not print what I wanted. 我从下面的简单脚本开始,但是没有显示我想要的内容。

with open("test.txt", "rb") as f:
    lines = f.read()
    word = ["number1", "number2", "time"]
    if any(item in lines for item in word):
        val1 = lines.split("number1 ", 1)[1]
        print val1

This return the following result 这将返回以下结果

234 **
/// 45daa;: number2 12

time 3:44

Then I tried changing f.read() to f.readlines() but this time it did not print out anything. 然后我尝试将f.read()更改为f.readlines()但这一次它没有打印出任何内容。

Does anyone know other way to do this? 有人知道其他方法吗? Eventually I want to get the value for each line for example 234 , 12 and 3:44 and store it inside the database. 最后,我想对每行的值,例如234123:44 ,并将其存储在数据库中。

Thank you for your help. 谢谢您的帮助。 I really appreciate it. 我真的很感激。

Explanations given below: 解释如下:

with open("test.txt", "r") as f:
    lines = f.readlines()
    stripped_lines = [line.strip() for line in lines]

words = ["number1", "number2", "time"]
for a_line in stripped_lines:
    for word in words:
        if word in a_line:
            number = a_line.split()[1]
            print(number)

1) First of all 'rb' gives bytes object ie something like b'number1 234' would be returned use 'r' to get string object. 1)首先,“ rb”给出字节对象,即将使用“ r”返回类似b'number1 234'字符串对象。

2) The lines you read will be something like this and it will be stored in a list. 2)您阅读的行将是这样,并将存储在列表中。

['number1 234\\r\\n', 'number2 12\\r\\n', '\\r\\n', 'time 3:44']

Notice the \\r\\n those specify that you have a newline. 注意\\r\\n这些指定您有换行符。 To remove use strip() . 要删除使用strip()

3) Take each line from stripped_lines and take each word from words and check if that word is present in that line using in . 3)从stripped_lines获取每一line ,并从words获取每个word ,并使用in检查该行中是否存在该单词。

4) a_line would be number1 234 but we only want the number part. 4) a_line将为number1 234但我们只希望数字部分。 So split() output of that would be 所以split()输出将是

['number1','234'] and split()[1] would mean the element at index 1. (2nd element). ['number1','234']split()[1]表示索引1处的元素(第二个元素)。

5) You can also check if the string is a digit using your_string.isdigit() 5)您还可以使用your_string.isdigit()检查字符串是否为数字

UPDATE: Since you updated your question and input file this works: 更新: 由于您更新了问题和输入文件,因此可以:

import time

def isTimeFormat(input):
    try:
        time.strptime(input, '%H:%M')
        return True
    except ValueError:
        return False

with open("test.txt", "r") as f:
    lines = f.readlines()
    stripped_lines = [line.strip() for line in lines]

words = ["number1", "number2", "time"]
for a_line in stripped_lines:
    for word in words:
        if word in a_line:
            number = a_line.split()[-1] if (a_line.split()[-1].isdigit() or isTimeFormat(a_line.split()[-1]))  else a_line.split()[-2] 
            print(number)

why this isTimeFormat() function? 为什么这是isTimeFormat()函数?

def isTimeFormat(input):
        try:
            time.strptime(input, '%H:%M')
            return True
        except ValueError:

To check if 3:44 or 4:55 is time formats. 检查3:44或4:55是时间格式。 Since you are considering them as values too. 因为您也将它们视为价值。 Final output: 最终输出:

234
12
3:44

After some try and error, I found a solution like below. 经过一番尝试和错误,我找到了下面的解决方案。 This is based on answer provided by @s_vishnu 这基于@s_vishnu提供的答案

with open("test.txt", "r") as f:
    lines = f.readlines()
    stripped_lines = [line.strip() for line in lines]

    for item in stripped_lines:
        if "number1" in item:
            getval = item.split("actual ")[1].split(" ")[0]
            print getval

        if "number2" in item:
            getval2 = item.split("number2 ")[1].split(" ")[0]
            print getval2

        if "time" in item:
            getval3 = item.split("number3 ")[1].split(" ")[0]
            print getval3

output 产量

234
12
3:44

This way, I can also do other things for example saving each data to a database. 这样,我还可以做其他事情,例如将每个数据保存到数据库。

I am open to any suggestion to further improve my answer. 我愿意提出任何进一步改善答案的建议。

You're overthinking this. 您想得太多了。 Assuming you don't have those two asterisks at the end of the first line and you want to print out lines containing a certain value(s), you can just read the file line by line, check if any of the chosen values match and print out the last value (value between a space and the end of the line) - no need to parse/split the whole line at all: 假设您在第一行的末尾没有这两个星号,并且要打印出包含某个值的行,则可以逐行读取文件,检查是否选择了任何匹配的值,并且打印出最后一个值(空格和行尾之间的值)-根本不需要解析/分割整行:

search_values = ["number1", "number2", "time"]  # values to search for

with open("test.txt", "r") as f:  # open your file
    for line in f:  # read it it line by line
        if any(value in line for value in search_values):  # check for search_values in line
            print(line[line.rfind(" ") + 1:].rstrip())  # print the last value after space

Which will give you: 这会给你:

234
12
3:44

If you do have asterisks you have to more precisely define your file format as splitting won't necessarily yield you your desired value. 如果确实有星号,则必须更精确地定义文件格式,因为拆分不一定会产生所需的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM