[英]Extract line from txt file using python
我是新来的,目前正在学习 python。 这是我在此的头一篇博文。
我正在尝试提取特定用户通过.txt 文件发送的聊天记录。 例如号码 +99 9999 9999。但我无法获得介于两者之间的内容。
02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
03/09/2020, 01:55 - +88 8888 8888: 2-SEP-2020
task c -Changes c
task d Changes d
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f
我目前的代码是
number = "+99 9999 9999"
with open('text.txt') as input_data:
for line in input_data:
if number in line:
print(line)
我的 output 是数字与内容
02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
如果数字与行匹配,如何编辑我的代码以显示后面的行? 任何指导将不胜感激。
我想要的 output
02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f
新数据
[23/9/20, 11:26:42 PM] John - Salesman: 23/09/2020
-task a
-task b
[23/9/20, 11:30:03 PM] Shawn - Support: 23/09/2020
-task c
-task d
[24/9/20, 9:54:44 PM]Shawn - Support: 24/09/2020
-task e
-task f
[24/9/20, 10:06:58 PM] Damien - Support: 24/09/2020
-task g
-task h
-task i
-task j
[24/9/20, 10:53:52 PM] John - Salesman: 24/09/2020
-task k
-task l
-task m
-task n
您已整理好文件读取部分。 您需要弄清楚打印语句。
这是处理它的代码。 为简单起见,我将文件中的所有数据分配给一个变量。 我还修改了输入数据。 第一组有3行+99 9999 9999
import re
filedata = '''02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
task c Changes c
03/09/2020, 01:55 - +88 8888 8888: 2-SEP-2020
task c -Changes c
task d Changes d
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f'''
number = '+99 9999 9999'
for line in filedata.split('\n'):
z = re.match(r"[+\d{2} \d{4} \d{4}]",line)
if z: found = number in line
if found: print (line)
上面代码的解释:
对于读取的每一行,对 +nn nnnn nnnn 进行正则表达式匹配,其中 n 是任意数字(d 表示数字)。 结果被发送到 z。
如果 z 有任何值,则找到匹配项。 如果我们找到匹配项,那么您想知道该行是 +99 9999 9999 还是其他数字模式。
如果模式匹配,则将标志设置为找到。 如果找到标志,则打印该行。 继续打印该行,直到找到下一组 +nn nnnn nnnn 行。 找到后,检查是否为+99 9999 9999。如果不是,则将标志设置为False。 found = number in line
导致 True 或 False。 当标志为 False 时,我们知道已经开始了不同的集合。 停止打印线条。
希望这能解释。 如果您对逻辑仍有疑问,请告诉我。
output 将是:
02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
task c Changes c
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f
无论您在 +99 9999 9999 和下一组 +nn nnnn nnnn 之间有多少行,这都会起作用,其中 n 可以是任何数字。
这是读取文件所需的代码:
import re
number = "+99 9999 9999"
with open('text.txt') as input_data:
for line in input_data:
z = re.match(r"[+\d{2} \d{4} \d{4}]",line)
if z: found = number in line
if found: print (line)
我正在对您在这里尝试做的事情进行一些疯狂的猜测。
假设您想在文件中找到John +99 9999 9999
作为字符串并打印与此相关的所有行。 然后是代码。
import re
filedata = '''02/09/2020, 23:45 - John +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
task c Changes c
03/09/2020, 01:55 - Suzan +88 8888 8888: 2-SEP-2020
task c -Changes c
task d Changes d
03/09/2020, 01:55 - Thomas +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f'''
name = 'John'
for line in filedata.split('\n'):
z = re.findall(r"\w+ \+\d{2} \d{4} \d{4}",line)
if z: found = (name in line) and (line[:4] != 'task')
if found: print (line)
output 将是:
02/09/2020, 23:45 - John +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
task c Changes c
这将适用于以下代码模式:
02/09/2020, 23:45 - John , Salesman +99 9999 9999: 02/09/2020
02/09/2020, 23:45 - John Salesman +99 9999 9999: 02/09/2020
让我知道您要查找的内容。 希望所有这些示例都可以帮助您获得所需的内容。
根据您分享的新数据,代码如下:
filedata = """[23/9/20, 11:26:42 PM] John - Salesman: 23/09/2020
-task a
-task b
[23/9/20, 11:30:03 PM] Shawn - Support: 23/09/2020
-task c
-task d
[24/9/20, 9:54:44 PM]Shawn - Support: 24/09/2020
-task e
-task f
[24/9/20, 10:06:58 PM] Damien - Support: 24/09/2020
-task g
-task h
-task i
-task j
[24/9/20, 10:53:52 PM] John - Salesman: 24/09/2020
-task k
-task l
-task m
-task n"""
import re
name = 'John - Salesman'
for line in filedata.split('\n'):
z = re.findall(r"([\w+ \- \w+:]*\d{2}\/\d{2}\/\d{4})",line)
if z: found = (name in line) and (line[:4] != 'task')
if found: print (line)
output 将是:
[23/9/20, 11:26:42 PM] John - Salesman: 23/09/2020
-task a
-task b
[24/9/20, 10:53:52 PM] John - Salesman: 24/09/2020
-task k
-task l
-task m
-task n
如果您想尝试使用正则表达式,可以在此处尝试regEx 表达式
这是你的答案:
number = "+99 9999 9999"
with open('text.txt') as input_data:
lines = input_data.readlines()
# Instead of looping over the lines, we
# loop over an array of integers starting
# from zero and ending at the (number of
# lines in the file minus 1).
# (Remember,python lists are zero indexed, thats why)
for line_no in range(len(lines)):
if number in lines[line_no]:
# The current line
print(lines[line_no], end="")
# Print the next line
print(lines[line_no+1], end="")
# And the next one too
print(lines[line_no+2], end="")
import re
with open('text.txt') as input_data:
lines = input_data.readlines()
re_number = re.compile("\+\d\d \d\d\d\d \d\d\d\d")
number = "+99 9999 9999"
blocks = []
tmp_block = []
for index, line in enumerate(lines):
if (re_number.search(line)):
if (tmp_block):
blocks.append(tmp_block.copy())
tmp_block.clear()
flag = 0
if (number in line):
flag = 1
if (flag):
tmp_block.append(line)
if (flag):
blocks.append(tmp_block.copy())
print (blocks)
因为您的代码只是找到包含您想要的数字的行,所以您可以设置一个标志来打印以下内容,直到另一个数字:
if __name__ == '__main__':
number = "+99 9999 9999"
task = 'task'
wanted = False
with open('text.txt') as input_data:
for line in input_data:
if number in line or wanted:
wanted = True
print(line.strip())
if line[:4] != task and number not in line:
wanted = False
如果您的历史文件有超过 2 个针对单个电话号码的任务,这将是我的另一个解决方案。
代码语法
inp = input("please, Enter your input that you want to search for: ")
def scanner (path, input):
with open(path) as file:
lines = file.readlines()
for index, line in enumerate(lines):
if line[0].isdigit() == True and line[20: -13] == input:
print(line)
lin = index+1
try:
while lines[lin][0].isdigit() is False:
print(lines[lin])
lin +=1
except IndexError:
break
print("="*40)
print(f"*****History of {inp}*****")
scanner(path, inp)
Output
please, Enter your input that you want to search for: +99 9999 9999
========================================
*****History of +99 9999 9999*****
02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f
[Program finished]
这一个与同一个点的关系如此之大,这里的区别将在于提取的搜索操作。
代码语法
def scanner2(path, input):
with open(path) as file:
lines = file.readlines()
for index, line in enumerate(lines):
if line[1].isdigit() == True and line[22: -13].strip(" ") == input:
print(line)
lin = index+1
try:
while lines[lin][1].isdigit() is False:
print(lines[lin])
lin +=1
except IndexError:
break
print("="*40)
print(f"*****History of {inp}*****")
scanner2(path2, inp)
Output (输入区分大小写)
please, Enter your input that you want to search for: John - Salesman
========================================
*****History of John - Salesman*****
[23/9/20, 11:26:42 PM] John - Salesman: 23/09/2020
-task a
-task b
[24/9/20, 10:53:52 PM] John - Salesman: 24/09/2020
-task k
-task l
-task m
-task n
[Program finished]
试试这个代码,
代码语法
path = 'extractiondata.txt'
def scanner(path, input):
with open(path) as file:
lista = file.readlines()
for index, each in enumerate(lista):
if each[20:-13] == input:
print(each)
print(lista[index+1])
print(lista[index+2])
inp = input("please, Enter your input that you want to search for: ")
scanner(path, inp)
Output
02/09/2020, 23:45 - +99 9999 9999: 02/09/2020
task A -Changes A
task b Changes b
03/09/2020, 01:55 - +99 9999 9999: 2-SEP-2020
task e -Changes e
task f Changes f
只需检查您想要的号码是否在字符串中
with open('text.txt') as input_data:
lines = [i.rstrip('\n') for i in input_data.readlines()]
blocks = []
number = "+99 9999 9999"
while len(lines) != 0:
if number in lines[0]:
blocks.append(lines[:3])
lines = lines[3:]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.