[英]How Do I Use a Generator on a Data File to Convert JSON and TSV Rows Into a Dataframe?
I have a ".data" file containing these two sample rows below.我有一个“.data”文件,其中包含下面这两个示例行。 The first row denotes json and the second row denotes tsv.第一行表示 json,第二行表示 tsv。 I would like to convert the json to a python dictionary and the tsv lines into a python dictionary and then output both into a dataframe using a generator. I would like to convert the json to a python dictionary and the tsv lines into a python dictionary and then output both into a dataframe using a generator.
###SAMPLE LINES of ".DATA" FILE### ###“.DATA”文件的示例行###
{"Book": "American Horror", "Author": "Me", "date": "12/12/2012", publisher": "Fox"
Sports Law Some Body 06/12/1999 Random House 1000
import json
def generator(file):
for row in open(file, encoding="ISO-8859-1"):
print(row)
if "{" in row:
yield json.loads(row)
else:
###I don't know where to begin with the tsv data
###tsv data must fit under column names of json data
for tsv in row:
yield tsv
file = ".data_file"
with open(file,'r') a some_stuff:
df = pd.DataFrame(data=generator(some_stuff))
df
'''
By "TSV" I assume that your data is tab separated, ie the fields are delimited by a single tab character.通过“TSV”,我假设您的数据是制表符分隔的,即字段由单个制表符分隔。 If that is the case you can use str.split('\t')
to break up the fields, like this:如果是这种情况,您可以使用str.split('\t')
来分解字段,如下所示:
>>> line = 'Sports Law\tSome Body\t06/12/1999\tRandom House 1000\n'
>>> line.rstrip().split('\t')
['Sports Law', 'Some Body', '06/12/1999', 'Random House 1000']
The rstrip()
is there to remove the new line at the end of the lines that you would read from the file. rstrip()
用于删除您将从文件中读取的行末尾的新行。
Then create a dictionary and yield it:然后创建一个字典并生成它:
book, author, date, publisher = line.rstrip().split('\t')
yield dict(Book=book, Author=author, date=date, publisher=publisher)
Or if you already have a list of column names:或者,如果您已经有列名列表:
columns = ['Book', 'Author', 'date', 'publisher']
yield dict(zip(columns, line.rstrip().split('\t')))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.