[英]How to read a file block-wise in python
I am bit stuck in reading a file block-wise, and facing difficulty in getting some selective data in each block : 我在逐块读取文件时遇到了麻烦,并且在获取每个块中的某些选择性数据方面遇到了困难:
Here is my file content : 这是我的文件内容:
DATA.txt
#-----FILE-----STARTS-----HERE--#
#--COMMENTS CAN BE ADDED HERE--#
BLOCK IMPULSE DATE 01-JAN-2010 6 DEHDUESO203028DJE \
SEQUENCE=ai=0:at=221:ae=3:lu=100:lo=NNU:ei=1021055:lr=1: \
USERID=ID=291821 NO_USERS=3 GROUP=ONE id_info=1021055 \
CREATION_DATE=27-JUNE-2013 SN=1021055 KEY ="22WS \
DE34 43RE ED54 GT65 HY67 AQ12 ES23 54CD 87BG 98VC \
4325 BG56"
BLOCK PASSION DATE 01-JAN-2010 6 DEHDUESO203028DJE \
SEQUENCE=ai=0:at=221:ae=3:lu=100:lo=NNU:ei=324356:lr=1: \
USERID=ID=291821 NO_USERS=1 GROUP=ONE id_info=324356 \
CREATION_DATE=27-MAY-2012 SN=324356 KEY ="22WS \
DE34 43RE 342E WSEW T54R HY67 TFRT 4ER4 WE23 XS21 \
CD32 12QW"
BLOCK VICTOR DATE 01-JAN-2010 6 DEHDUESO203028DJE \
SEQUENCE=ai=0:at=221:ae=3:lu=100:lo=NNU:ei=324356:lr=1: \
USERID=ID=291821 NO_USERS=5 GROUP=ONE id_info=324356 \
CREATION_DATE=27-MAY-2012 SN=324356 KEY ="22WS \
DE34 43RE 342E WSEW T54R HY67 TFRT 4ER4 WE23 XS21 \
CD32 12QW"
#--BLOCK--ENDS--HERE#
#--NEW--BLOCKS--CAN--BE--APPENDED--HERE--#
I am only interested in Block Name , NO_USERS, and id_info of each block . 我只对每个块的块名,NO_USERS和id_info感兴趣。 these three data to be saved to a data-structure(lets say dict), which is further stored in a list :
这三个数据将保存到一个数据结构中(更进一步说是dict),然后进一步存储在一个列表中:
[{Name: IMPULSE ,NO_USER=3,id_info=1021055},{Name: PASSION ,NO_USER=1,id_info=324356}. . . ]
any other data structure which can hold the info would also be fine. 可以保存信息的任何其他数据结构也可以。
So far i have tried getting the block names by reading line by line : 到目前为止,我已经尝试通过逐行读取来获取块名称:
fOpen = open('DATA.txt')
unique =[]
for row in fOpen:
if "BLOCK" in row:
unique.append(row.split()[1])
print unique
i am thinking of regular expression approach, but i have no idea where to start with. 我正在考虑使用正则表达式方法,但是我不知道从哪里开始。 Any help would be appreciate.Meanwhile i am also trying , will update if i get something .
任何帮助将不胜感激。同时我也正在尝试,如果我得到什么会更新。 Please help .
请帮忙 。
You could use groupy to find each block, use a regex to extract the info and put the values in dicts: 您可以使用groupy查找每个块,使用正则表达式提取信息并将值放入字典中:
from itertools import groupby
import re
with open("test.txt") as f:
data = []
# find NO_USERS= 1+ digits or id_info= 1_ digits
r = re.compile("NO_USERS=\d+|id_info=\d+")
grps = groupby(f,key=lambda x:x.strip().startswith("BLOCK"))
for k,v in grps:
# if k is True we have a block line
if k:
# get name after BLOCK
name = next(v).split(None,2)[1]
# get lines after BLOCK and get the second of those
t = next(grps)[1]
# we want two lines after BLOCK
_, l = next(t), next(t)
d = dict(s.split("=") for s in r.findall(l))
# add name to dict
d["Name"] = name
# add sict to data list
data.append(d)
print(data)
Output: 输出:
[{'NO_USERS': '3', 'id_info': '1021055', 'Name': 'IMPULSE'},
{'NO_USERS': '1', 'id_info': '324356', 'Name': 'PASSION'},
{'NO_USERS': '5', 'id_info': '324356', 'Name': 'VICTOR'}]
Or without groupby as your file follows a format we just need to extract the second line after the BLOCK line: 或没有groupby,因为您的文件遵循一种格式,我们只需要提取BLOCK行之后的第二行:
with open("test.txt") as f:
data = []
r = re.compile("NO_USERS=\d+|id_info=\d+")
for line in f:
# if True we have a new block
if line.startswith("BLOCK"):
# call next twice to get thw second line after BLOCK
_, l = next(f), next(f)
# get name after BLOCK
name = line.split(None,2)[1]
# find our substrings from l
d = dict(s.split("=") for s in r.findall(l))
d["Name"] = name
data.append(d)
print(data)
Output: 输出:
[{'NO_USERS': '3', 'id_info': '1021055', 'Name': 'IMPULSE'},
{'NO_USERS': '1', 'id_info': '324356', 'Name': 'PASSION'},
{'NO_USERS': '5', 'id_info': '324356', 'Name': 'VICTOR'}]
To extract values you can iterate: 要提取值,您可以迭代:
for dct in data:
print(dct["NO_USERS"])
Output: 输出:
3
1
5
If you want a dict of dicts and to access each section from 1-n you can store as nested dicts using from 1-n as tke key: 如果您想要听写命令并从1-n访问每个节,则可以使用从1-n作为tke键存储为嵌套听写:
from itertools import count
import re
with open("test.txt") as f:
data, cn = {}, count(1)
r = re.compile("NO_USERS=\d+|id_info=\d+")
for line in f:
if line.startswith("BLOCK"):
_, l = next(f), next(f)
name = line.split(None,2)[1]
d = dict(s.split("=") for s in r.findall(l))
d["Name"] = name
data[next(cn)] = d
data["num_blocks"] = next(cn) - 1
Output: 输出:
from pprint import pprint as pp
pp(data)
{1: {'NO_USERS': '3', 'Name': 'IMPULSE', 'id_info': '1021055'},
2: {'NO_USERS': '1', 'Name': 'PASSION', 'id_info': '324356'},
3: {'NO_USERS': '5', 'Name': 'VICTOR', 'id_info': '324356'},
'num_blocks': 3}
'num_blocks'
will tell you exactly how many blocks you extracted. 'num_blocks'
将准确告诉您您提取了多少个块。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.