繁体   English   中英

使用 python 从 txt 文件中提取行

[英]Extract line from txt file using python

我是新来的,目前正在学习 python。 这是我在此的头一篇博文。

我正在尝试提取特定用户通过.txt 文件发送的聊天记录。 例如号码 +99 9999 9999。但我无法获得介于两者之间的内容。

02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
03/09/2020, 01:55 - +88 8888 8888: 2-SEP-2020
task c -Changes c
task d Changes d
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f

我目前的代码是

number = "+99 9999 9999"
with open('text.txt') as input_data:
    for line in input_data:
        if number in line: 
            print(line)

我的 output 是数字与内容

02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020

如果数字与行匹配,如何编辑我的代码以显示后面的行? 任何指导将不胜感激。

我想要的 output

02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f

新数据

[23/9/20, 11:26:42 PM] John - Salesman: 23/09/2020
-task a
-task b
[23/9/20, 11:30:03 PM] Shawn - Support: 23/09/2020
-task c
-task d
[24/9/20, 9:54:44 PM]Shawn - Support: 24/09/2020
-task e
-task f
[24/9/20, 10:06:58 PM] Damien - Support: 24/09/2020
-task g
-task h
-task i
-task j
[24/9/20, 10:53:52 PM] John - Salesman: 24/09/2020
-task k
-task l
-task m
-task n

您已整理好文件读取部分。 您需要弄清楚打印语句。

这是处理它的代码。 为简单起见,我将文件中的所有数据分配给一个变量。 我还修改了输入数据。 第一组有3行+99 9999 9999

import re

filedata = '''02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
task c Changes c
03/09/2020, 01:55 - +88 8888 8888: 2-SEP-2020
task c -Changes c
task d Changes d
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f'''

number = '+99 9999 9999'

for line in filedata.split('\n'):
    z = re.match(r"[+\d{2} \d{4} \d{4}]",line)
    if z: found = number in line
    if found: print (line)

上面代码的解释:

对于读取的每一行,对 +nn nnnn nnnn 进行正则表达式匹配,其中 n 是任意数字(d 表示数字)。 结果被发送到 z。

如果 z 有任何值,则找到匹配项。 如果我们找到匹配项,那么您想知道该行是 +99 9999 9999 还是其他数字模式。

如果模式匹配,则将标志设置为找到。 如果找到标志,则打印该行。 继续打印该行,直到找到下一组 +nn nnnn nnnn 行。 找到后,检查是否为+99 9999 9999。如果不是,则将标志设置为False。 found = number in line导致 True 或 False。 当标志为 False 时,我们知道已经开始了不同的集合。 停止打印线条。

希望这能解释。 如果您对逻辑仍有疑问,请告诉我。

output 将是:

02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
task c Changes c
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f

无论您在 +99 9999 9999 和下一组 +nn nnnn nnnn 之间有多少行,这都会起作用,其中 n 可以是任何数字。

这是读取文件所需的代码:

import re
number = "+99 9999 9999"
with open('text.txt') as input_data:
    for line in input_data:
        z = re.match(r"[+\d{2} \d{4} \d{4}]",line)
        if z: found = number in line
        if found: print (line)

我正在对您在这里尝试做的事情进行一些疯狂的猜测。

假设您想在文件中找到John +99 9999 9999作为字符串并打印与此相关的所有行。 然后是代码。

import re
filedata = '''02/09/2020, 23:45 - John +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
task c Changes c
03/09/2020, 01:55 - Suzan +88 8888 8888: 2-SEP-2020
task c -Changes c
task d Changes d
03/09/2020, 01:55 - Thomas +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f'''

name = 'John'
for line in filedata.split('\n'):
    z = re.findall(r"\w+ \+\d{2} \d{4} \d{4}",line)
    if z: found = (name in line) and (line[:4] != 'task')
    if found: print (line)

output 将是:

02/09/2020, 23:45 - John +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
task c Changes c

这将适用于以下代码模式:

02/09/2020, 23:45 - John , Salesman +99 9999 9999: 02/09/2020

02/09/2020, 23:45 - John Salesman +99 9999 9999: 02/09/2020

让我知道您要查找的内容。 希望所有这些示例都可以帮助您获得所需的内容。

根据您分享的新数据,代码如下:

filedata = """[23/9/20, 11:26:42 PM] John - Salesman: 23/09/2020
-task a
-task b
[23/9/20, 11:30:03 PM] Shawn - Support: 23/09/2020
-task c
-task d
[24/9/20, 9:54:44 PM]Shawn - Support: 24/09/2020
-task e
-task f
[24/9/20, 10:06:58 PM] Damien - Support: 24/09/2020
-task g
-task h
-task i
-task j
[24/9/20, 10:53:52 PM] John - Salesman: 24/09/2020
-task k
-task l
-task m
-task n"""

import re
name = 'John - Salesman'
for line in filedata.split('\n'):
    z = re.findall(r"([\w+ \- \w+:]*\d{2}\/\d{2}\/\d{4})",line)
    if z: found = (name in line) and (line[:4] != 'task')
    if found: print (line)

output 将是:

[23/9/20, 11:26:42 PM] John - Salesman: 23/09/2020
-task a
-task b
[24/9/20, 10:53:52 PM] John - Salesman: 24/09/2020
-task k
-task l
-task m
-task n

如果您想尝试使用正则表达式,可以在此处尝试regEx 表达式

这是你的答案:

number = "+99 9999 9999"
with open('text.txt') as input_data:
    lines = input_data.readlines()

    # Instead of looping over the lines, we
    # loop over an array of integers starting
    # from zero and ending at the (number of
    # lines in the file minus 1).
    # (Remember,python lists are zero indexed, thats why)
    for line_no in range(len(lines)):
        if number in lines[line_no]:

            # The current line
            print(lines[line_no], end="")

            # Print the next line
            print(lines[line_no+1], end="")

            # And the next one too
            print(lines[line_no+2], end="")
import re

with open('text.txt') as input_data:
    lines = input_data.readlines()
    re_number = re.compile("\+\d\d \d\d\d\d \d\d\d\d")
    number = "+99 9999 9999"
    blocks = []
    tmp_block = []
    for index, line in enumerate(lines):
        if (re_number.search(line)):
            if (tmp_block):
                blocks.append(tmp_block.copy())
                tmp_block.clear()
                flag = 0
            if (number in line):
                flag = 1
        if (flag):
            tmp_block.append(line)
    if (flag):
        blocks.append(tmp_block.copy())

print (blocks)

因为您的代码只是找到包含您想要的数字的行,所以您可以设置一个标志来打印以下内容,直到另一个数字:

if __name__ == '__main__':
    number = "+99 9999 9999"
    task = 'task'
    wanted = False
    with open('text.txt') as input_data:
        for line in input_data:
            if number in line or wanted:
                wanted = True
                print(line.strip())
            if line[:4] != task and number not in line:
                wanted = False

任务1

如果您的历史文件有超过 2 个针对单个电话号码的任务,这将是我的另一个解决方案。

代码语法

inp = input("please, Enter your input that you want to search for: ")


def scanner (path, input):
    with open(path) as file:
         lines = file.readlines()
         for index, line in enumerate(lines):
             if line[0].isdigit() == True and line[20: -13] == input:
                 print(line)
                 lin = index+1
                 try:
                     while lines[lin][0].isdigit() is False:
                         print(lines[lin])
                         lin +=1
                 except IndexError:
                     break

print("="*40)
print(f"*****History of {inp}*****")        
scanner(path, inp)

Output

please, Enter your input that you want to search for: +99 9999 9999
========================================
*****History of +99 9999 9999*****
02/09/2020, 23:45 - +99 9999 9999: 02/09/2020

task A -Changes A

task b Changes b

03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020

task e -Changes e

task f Changes f

[Program finished]

任务 2

这一个与同一个点的关系如此之大,这里的区别将在于提取的搜索操作。

代码语法

def scanner2(path, input):
    with open(path) as file:
         lines = file.readlines()
         for index, line in enumerate(lines):
             if line[1].isdigit() == True and line[22: -13].strip(" ") == input:
                 print(line)
                 lin = index+1
                 try:
                     while lines[lin][1].isdigit() is False:
                         print(lines[lin])
                         lin +=1
                 except IndexError:
                     break


print("="*40)
print(f"*****History of {inp}*****")        
scanner2(path2, inp)

Output输入区分大小写

please, Enter your input that you want to search for: John - Salesman
========================================
*****History of John - Salesman*****
[23/9/20, 11:26:42 PM] John - Salesman: 23/09/2020

-task a

-task b

[24/9/20, 10:53:52 PM] John - Salesman: 24/09/2020

-task k

-task l

-task m

-task n

[Program finished]

试试这个代码,

代码语法

path = 'extractiondata.txt'

def scanner(path, input):
    with open(path) as file:
        lista  = file.readlines()
        for index, each in enumerate(lista):
            if each[20:-13] == input:
                print(each)
                print(lista[index+1])
                print(lista[index+2])                
        

inp = input("please, Enter your input that you want to search for: ")                  
scanner(path, inp)


Output

02/09/2020, 23:45 - +99 9999 9999: 02/09/2020

task A -Changes A

task b Changes b

03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020

task e -Changes e

task f Changes f

只需检查您想要的号码是否在字符串中

with open('text.txt') as input_data:
  lines = [i.rstrip('\n') for i in input_data.readlines()]

blocks = []
number = "+99 9999 9999"
while len(lines) != 0:
  if number in lines[0]:
    blocks.append(lines[:3])
  lines = lines[3:]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM