[英]I have a text document, and I want to copy everything in a line after a specific keyword. How would I do this?
More specifically, I downloaded all of my messages over google hangouts via Google Takeout , but a lot of it is data that is useless to me. 更具体地说,我通过Google Takeout通过google环聊下载了所有消息,但其中很多对我来说都是无用的数据。 The only thing I care about is the actual messages, not even the timestamps.
我唯一关心的是实际消息,甚至没有时间戳。 Every message in there is a seperate line in the .json file, and looks like
.json文件中的每条消息都有单独的一行,看起来像
"text" : "[actual message in here, including the brackets]"
So how would I extract ever message, and preferably put them all on separate lines in chronological order? 那么我将如何提取曾经的消息,并最好按时间顺序将它们全部放在单独的行上? (they're all already in order, the top of the .json file is the newest messages, the bottom is the oldest) Maybe someone could download their own Google Takeout file for hangouts to try and do this.
(它们已经全部整理好了,.json文件的顶部是最新消息,底部是最旧消息)也许有人可以下载自己的Google Takeout文件供环聊尝试这样做。 Any help would be appreciated.
任何帮助,将不胜感激。 Python would probably be best for this task, but any programming language that gets the job done will be sufficient.
Python可能最适合此任务,但是完成任务的任何编程语言都足够了。
One way you could accomplish this with python is by loading the json file to a dictionary data structure and then print back the values you want. 您可以使用python完成此操作的一种方法是,将json文件加载到字典数据结构中,然后打印回所需的值。
You didn't specify the exact structure of the json so if the json is an array composed of objects with 'text' key in them then this would do the job (change this according to json structure): 您没有指定json的确切结构,因此,如果json是由对象组成的数组,这些对象中带有'text'键,则可以完成此工作(根据json结构进行更改):
import json
hangout_data = open('hangout_data') #Load the json file into a variable as text.
hangout_dict = json.loads(hangout_data) #Convert the json text to a dictionary.
for key, value in hangout_dict.iteritems(): #Go over the dictionary
print(value['text'][1:-1]) #print the text property of each object in the array. [1:-1] strips the brackets.
Hope this helps. 希望这可以帮助。 You are more than welcome to post the exact structure and I will provide a more specific answer.
非常欢迎您发布确切的结构,我将提供更具体的答案。
If you want to treat things as just plain text: 如果您想将事物视为纯文本:
file = open('filepath', 'r')
for line in file:
strippedline=line.lstrip().rstrip() #lstrip removes leading white space, rstrip removes trailing '\n' (and other white space)
if strippedline.startswith('"text" :'):
message = ':'.join(strippedline.split(':')[1:])
print message
Probably best to just go through the native json
keyword commands. 最好只执行本机
json
关键字命令。
here is an input file: 这是一个输入文件:
"text" : "[actual message in here, including the brackets]"
"text" : "[actual message in here, including the brackets]"
"text" : "[actual message in here, including : the brackets and some ':' ]"
"texat" : "[This isn't a legal message]"
"text" : "[actual message in here, including the brackets. Note leading white space ]"
and the output: 和输出:
"[actual message in here, including the brackets]"
"[actual message in here, including the brackets]"
"[actual message in here, including : the brackets and some ':' ]"
"[actual message in here, including the brackets. Note leading white space ]"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.