简体   繁体   English

从.json文件获取特定数据,然后将其保存到python中的2D矩阵/字典中

[英]Get specific data from a .json file and save them to a 2D matrix/dictionary in python

I am new at python and I have been trying to do some NLP on various .json files inside a folder. 我是python的新手,我一直在尝试对文件夹内的各种.json文件进行一些NLP处理。 I have managed to get and print separately every entry from the dictionary using the key which is article get the description value. 我设法使用键(即文章获取描述值)从字典中分别获取和打印每个条目。 The thing is every time the loop executes I save the new data value to the same variable which is body1. 关键是每次循环执行时,我都会将新数据值保存到与body1相同的变量中。 What I find for some reason particularly difficult to do is save each data entry (each articles description) in a two dimensional matrix or a table of dictionaries if you which in order to be able to have all the entries there for future use. 由于某些原因,我发现很难执行的操作是将每个数据条目(每个文章说明)保存在二维矩阵或字典表中(如果要保存),以便能够将所有条目保存在那里以备将来使用。 Something like : 就像是 :

body1 = ['file_name', 'description', 'file_name', 'description', 'file_name', 'description'] So if I need I will be able to print the second file's description using body1[name][description]. body1 = ['file_name','description','file_name','description','file_name','description']因此,如果需要的话,我可以使用body1 [name] [description]打印第二个文件的描述。 Now in every iteration the data from the last iteration are lost. 现在,在每次迭代中,最后一次迭代中的数据都会丢失。 I think that my C-configured was of thinking does now let me see the answer to that. 我认为我的C配置想法确实让我看到了答案。 I would appreciate any ideas. 我将不胜感激。

Thank you in advance, George 预先谢谢你,乔治

   import os
   import glob
   import json
   import nltk
   from nltk.corpus import stopwords
   from nltk import PorterStemmer

   stop = stopwords.words('english')
   stemmer=PorterStemmer()

   for name in glob.glob('/Users/jorjis/Desktop/test/*'):
     jfile = open(name, 'r')
     values = json.load(jfile)
     jfile.close()
     body1 = values['article']['description']
     tokens = nltk.wordpunct_tokenize(body1)
     tokens = [w.lower() for w in tokens]
     vocab = [word for word in tokens if word not in stop]
     print body1

You need to ceate a list outside the loop and append the values. 您需要在循环外建立一个列表并附加值。

final = [] # add values you want saved to final
uniq_ident = 1
for name in glob.glob('/Users/jorjis/Desktop/test/*'):
     jfile = open(name, 'r')
     values = json.load(jfile)
     jfile.close()
     body1 = values['article']['description']
     tokens = nltk.wordpunct_tokenize(body1)
     tokens = [w.lower() for w in tokens]
     vocab = [word for word in tokens if word not in stop]
     final.append([uniq_ident,vocab]) # append vocab or whatever values you want to keep
     uniq_ident += 1
     print body1

You can also use make final a dict with final = {} and use final[uniq_ident] = vocab 您还可以使用带有final = {} make final dict并使用final[uniq_ident] = vocab

If you want to keep final a list and append a dict each time use: 如果您想保留最终列表并在每次使用时附加字典:

 final.append({uniq_ident:vocab})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM