简体   繁体   中英

Extract line from txt file using python

I am new here and currently learning python. This is my first post here.

I am trying to extract a chat history sent out by a particular user via.txt file. for example number +99 9999 9999. But I'm unable to get the content in between.

02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
03/09/2020, 01:55 - +88 8888 8888: 2-SEP-2020
task c -Changes c
task d Changes d
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f

My current code is

number = "+99 9999 9999"
with open('text.txt') as input_data:
    for line in input_data:
        if number in line: 
            print(line)

my output is the number with the content

02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020

How can I edit my code to show the lines after if the number matches the row? Any guidance would be appreciated.

the output that i want

02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f

New data

[23/9/20, 11:26:42 PM] John - Salesman: 23/09/2020
-task a
-task b
[23/9/20, 11:30:03 PM] Shawn - Support: 23/09/2020
-task c
-task d
[24/9/20, 9:54:44 PM]Shawn - Support: 24/09/2020
-task e
-task f
[24/9/20, 10:06:58 PM] Damien - Support: 24/09/2020
-task g
-task h
-task i
-task j
[24/9/20, 10:53:52 PM] John - Salesman: 24/09/2020
-task k
-task l
-task m
-task n

You have the file read portion sorted out. You need to figure out the print statement.

Here's the code to take care of it. For simplicity, i assigned all the data in the file to a variable. Also I modified the input data. The first set has 3 rows for +99 9999 9999

import re

filedata = '''02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
task c Changes c
03/09/2020, 01:55 - +88 8888 8888: 2-SEP-2020
task c -Changes c
task d Changes d
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f'''

number = '+99 9999 9999'

for line in filedata.split('\n'):
    z = re.match(r"[+\d{2} \d{4} \d{4}]",line)
    if z: found = number in line
    if found: print (line)

Explanation of the above code:

For each line read, do a reg ex match for +nn nnnn nnnn where n is any digit (d denotes digit). The result is sent to z.

If z has any value, then a match was found. If we found a match, then you want to find out if the line is +99 9999 9999 or some other number pattern.

If the pattern matches, then you set the flag to found. If the flag is found, then print the line. Continue printing the line until the next set of +nn nnnn nnnn line is found. When found, check if it is +99 9999 9999. If it is not, then turn the flag to False. The condition found = number in line results in True or False. When the flag is False, we know a different set has started. Stop printing the lines.

Hope this explains. If you still have questions on the logic, let me know.

The output of this will be:

02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
task c Changes c
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f

This will work irrespective of how many rows you have between +99 9999 9999 and the next set of +nn nnnn nnnn where n can be any digit.

Here's the code you need with file read:

import re
number = "+99 9999 9999"
with open('text.txt') as input_data:
    for line in input_data:
        z = re.match(r"[+\d{2} \d{4} \d{4}]",line)
        if z: found = number in line
        if found: print (line)

I am making some wild guesses on what you are trying to do here.

Let's assume you want to find John +99 9999 9999 as a string in the file and print all the lines associated to this. Then here's the code.

import re
filedata = '''02/09/2020, 23:45 - John +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
task c Changes c
03/09/2020, 01:55 - Suzan +88 8888 8888: 2-SEP-2020
task c -Changes c
task d Changes d
03/09/2020, 01:55 - Thomas +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f'''

name = 'John'
for line in filedata.split('\n'):
    z = re.findall(r"\w+ \+\d{2} \d{4} \d{4}",line)
    if z: found = (name in line) and (line[:4] != 'task')
    if found: print (line)

The output of this will be:

02/09/2020, 23:45 - John +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
task c Changes c

This will work for the following patterns of code:

02/09/2020, 23:45 - John , Salesman +99 9999 9999: 02/09/2020

02/09/2020, 23:45 - John Salesman +99 9999 9999: 02/09/2020

Let me know what you are trying to find. Hopefully all these examples should help you get what you are looking for.

Based on the new data you shared, here's the code:

filedata = """[23/9/20, 11:26:42 PM] John - Salesman: 23/09/2020
-task a
-task b
[23/9/20, 11:30:03 PM] Shawn - Support: 23/09/2020
-task c
-task d
[24/9/20, 9:54:44 PM]Shawn - Support: 24/09/2020
-task e
-task f
[24/9/20, 10:06:58 PM] Damien - Support: 24/09/2020
-task g
-task h
-task i
-task j
[24/9/20, 10:53:52 PM] John - Salesman: 24/09/2020
-task k
-task l
-task m
-task n"""

import re
name = 'John - Salesman'
for line in filedata.split('\n'):
    z = re.findall(r"([\w+ \- \w+:]*\d{2}\/\d{2}\/\d{4})",line)
    if z: found = (name in line) and (line[:4] != 'task')
    if found: print (line)

The output of this will be:

[23/9/20, 11:26:42 PM] John - Salesman: 23/09/2020
-task a
-task b
[24/9/20, 10:53:52 PM] John - Salesman: 24/09/2020
-task k
-task l
-task m
-task n

In case you want to play around with the regex expression, you can try it out here regEx expression

Here's your answer:

number = "+99 9999 9999"
with open('text.txt') as input_data:
    lines = input_data.readlines()

    # Instead of looping over the lines, we
    # loop over an array of integers starting
    # from zero and ending at the (number of
    # lines in the file minus 1).
    # (Remember,python lists are zero indexed, thats why)
    for line_no in range(len(lines)):
        if number in lines[line_no]:

            # The current line
            print(lines[line_no], end="")

            # Print the next line
            print(lines[line_no+1], end="")

            # And the next one too
            print(lines[line_no+2], end="")
import re

with open('text.txt') as input_data:
    lines = input_data.readlines()
    re_number = re.compile("\+\d\d \d\d\d\d \d\d\d\d")
    number = "+99 9999 9999"
    blocks = []
    tmp_block = []
    for index, line in enumerate(lines):
        if (re_number.search(line)):
            if (tmp_block):
                blocks.append(tmp_block.copy())
                tmp_block.clear()
                flag = 0
            if (number in line):
                flag = 1
        if (flag):
            tmp_block.append(line)
    if (flag):
        blocks.append(tmp_block.copy())

print (blocks)

Because your code just find the line which contains number you wanted, you can set a flag to print following content until another number:

if __name__ == '__main__':
    number = "+99 9999 9999"
    task = 'task'
    wanted = False
    with open('text.txt') as input_data:
        for line in input_data:
            if number in line or wanted:
                wanted = True
                print(line.strip())
            if line[:4] != task and number not in line:
                wanted = False

Task 1

This will be another solution from me if your history file has more than 2 tasks for single phone number.

Code Syntax

inp = input("please, Enter your input that you want to search for: ")


def scanner (path, input):
    with open(path) as file:
         lines = file.readlines()
         for index, line in enumerate(lines):
             if line[0].isdigit() == True and line[20: -13] == input:
                 print(line)
                 lin = index+1
                 try:
                     while lines[lin][0].isdigit() is False:
                         print(lines[lin])
                         lin +=1
                 except IndexError:
                     break

print("="*40)
print(f"*****History of {inp}*****")        
scanner(path, inp)

Output

please, Enter your input that you want to search for: +99 9999 9999
========================================
*****History of +99 9999 9999*****
02/09/2020, 23:45 - +99 9999 9999: 02/09/2020

task A -Changes A

task b Changes b

03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020

task e -Changes e

task f Changes f

[Program finished]

Task 2

This one is so related to the same spot one, the difference here will be in the extracted search operation.

Code Syntax

def scanner2(path, input):
    with open(path) as file:
         lines = file.readlines()
         for index, line in enumerate(lines):
             if line[1].isdigit() == True and line[22: -13].strip(" ") == input:
                 print(line)
                 lin = index+1
                 try:
                     while lines[lin][1].isdigit() is False:
                         print(lines[lin])
                         lin +=1
                 except IndexError:
                     break


print("="*40)
print(f"*****History of {inp}*****")        
scanner2(path2, inp)

Output ( Input is case sensitive )

please, Enter your input that you want to search for: John - Salesman
========================================
*****History of John - Salesman*****
[23/9/20, 11:26:42 PM] John - Salesman: 23/09/2020

-task a

-task b

[24/9/20, 10:53:52 PM] John - Salesman: 24/09/2020

-task k

-task l

-task m

-task n

[Program finished]

Try this code,

Code Syntax

path = 'extractiondata.txt'

def scanner(path, input):
    with open(path) as file:
        lista  = file.readlines()
        for index, each in enumerate(lista):
            if each[20:-13] == input:
                print(each)
                print(lista[index+1])
                print(lista[index+2])                
        

inp = input("please, Enter your input that you want to search for: ")                  
scanner(path, inp)


Output

02/09/2020, 23:45 - +99 9999 9999: 02/09/2020

task A -Changes A

task b Changes b

03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020

task e -Changes e

task f Changes f

Just check if your number wanted is in the string

with open('text.txt') as input_data:
  lines = [i.rstrip('\n') for i in input_data.readlines()]

blocks = []
number = "+99 9999 9999"
while len(lines) != 0:
  if number in lines[0]:
    blocks.append(lines[:3])
  lines = lines[3:]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM