[英]How to read data corresponds to specific line numbers from a 60GB text file in python?
[英]Processing a big file in python (>60gb)
我有一個文本文件(> = 60Gig) ,其中的記錄是這樣的:
{"index": {"_type": "_doc", "_id": "bLcy4m8BAObvGO9GALME"}}
{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"flags\":2135,\"id\":816704468,\"access_hash\":\"788468819702098896\",\"first_name\":\"a\",\"last_name\":\"b\",\"phone\":\"123\",\"status\":{\"_\":\"userStatusOffline\",\"was_online\":132}}","phone":"12","@version":"1","typ":"telegram_contacts","access_hash":"123","id":816704468,"@timestamp":"2020-01-26T13:53:29.467Z","path":"/home/user/mirror_01/users_5d6ca02e7e736a7fc700df8c.log","type":"redis","flags":2135,"host":"ubuntu","imported_from":"telegram_contacts"}
{"index": {"_type": "_doc", "_id": "Z7cy4m8BAObvGO9GALME"}}
{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"flags\":2143,\"id\":323586643,\"access_hash\":\"8315858910992970114\",\"first_name\":\"bv\",\"last_name\":\"nj\",\"username\":\"kj\",\"phone\":\"123\",\"status\":{\"_\":\"userStatusRecently\"}}","phone":"123","@version":"1","typ":"telegram_contacts","access_hash":"8315858910992970114","id":323586643,"@timestamp":"2020-01-26T13:53:29.469Z","path":"/home/user/mirror_01/users_5d6ca02e7e736a7fc700df8c.log","username":"mbnab","type":"redis","flags":2143,"host":"ubuntu","imported_from":"telegram_contacts"}
我對此有幾個問題:
這些是我發現有用的一些 SO 帖子:
但仍然需要幫助。
您可以逐行處理文件並提取所需的信息。
with open('largefile.txt','r') as f:
for line in f:
# Extract what you need from that line of text here
print(line)
例如,要閱讀內容,您可以逐行處理文件並提取所需的信息。
with open('largefile.txt','r') as f:
for line in f:
# For example, to interpret the string as json, and read
# it in as a dictionary, do
if line.strip(): # check there is something on the line
data = json.loads(line)
# in your case, to fix the value for "message" do
if 'message' in data:
data['message'] = json.loads(data['message'])
# extract information you need here
我希望有更多的工作來提取您需要的信息,但我希望這能讓您開始。 祝你好運!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.