简体   繁体   English

从json创建的Pandas数据框有未命名的列 - 由于未命名的列问题而无法插入MySQL

[英]Pandas dataframe created from json has unnamed column - can't insert into MySQL due to unnamed column issue

Right now I messing with some JSON data and I am trying to push it into the MySQL database on the fly. 现在我搞乱了一些JSON数据,我试图将其推送到MySQL数据库中。 The JSON file is enormous so I have to carefully go through it line by line using yield function in Python, convert each JSON line into small pandas DF and write it into MySQL. JSON文件非常庞大,因此我必须使用Python中的yield函数逐行仔细检查它,将每个JSON行转换为小型pandas DF并将其写入MySQL。 The problem is that when I create DF from JSON it adds the index column. 问题是,当我从JSON创建DF时,它会添加索引列。 And it seems that when I write stuff to MySQL it ignores index=False option. 而且当我向MySQL写东西时它似乎忽略了index = False选项。 Code below 代码如下

import gzip
import pandas as pd
from sqlalchemy import create_engine

#stuff to parse json file
def parseJSON(path):
  g = open(path, 'r')
  for l in g:
      yield eval(l)
#MySQL engine
engine = create_engine('mysql://login:password@localhost:1234/MyDB', echo=False)
#empty df just to have it
df = {}

for l in parseJSON("MyFile.json"):
    df = pd.DataFrame.from_dict(l, orient='index')
    df.to_sql(name='MyTable', con=engine, if_exists = 'append', index=False)

And I get a error: 我收到一个错误:

OperationalError: (_mysql_exceptions.OperationalError) (1054, "Unknown column '0' in 'field list'")

Any ideas what I am missing? 我缺少什么想法? Or is there a way to get around this stuff? 或者有办法解决这些问题吗?

UPD. UPD。 I see that dataframe has an unnamed column with value 0 each time I create the dataframe in inner loop. 我看到每次在内循环中创建数据帧时,数据帧都有一个未命名的列,其值为0。

Here is some info about DF: 以下是有关DF的一些信息:

df
Out[155]: 
                                                                0
reviewerID                                         A1C2VKKDCP5H97
asin                                                   0007327064
reviewerName                                        Donna Polston
helpful                                                    [0, 0]
unixReviewTime                                         1392768000
reviewText      love Oddie ,One of my favorite books are the O...
overall                                                         5
reviewTime                                            02 19, 2014
summary                                                       Wow

print(df.columns)
RangeIndex(start=0, stop=1, step=1)

You currently have a frame with one column named 0 with your intended column names as the index of your frame. 您当前有一个框架,其中一列名为0,您的目标列名称作为框架的索引。 Perhaps you can try 也许你可以试试

df = pd.DataFrame.from_dict(l)

NOTE: I think you would have much better performance if you could build up a dict (or some other structure), convert all rows to a df then push to mysql. 注意:如果你可以构建一个dict(或其他一些结构),将所有行转换为df然后推送到mysql,我认为你会有更好的性能。 This one row at a time might be too slow 这一行一次可能太慢了

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM