简体   繁体   中英

Pandas dataframe created from json has unnamed column - can't insert into MySQL due to unnamed column issue

Right now I messing with some JSON data and I am trying to push it into the MySQL database on the fly. The JSON file is enormous so I have to carefully go through it line by line using yield function in Python, convert each JSON line into small pandas DF and write it into MySQL. The problem is that when I create DF from JSON it adds the index column. And it seems that when I write stuff to MySQL it ignores index=False option. Code below

import gzip
import pandas as pd
from sqlalchemy import create_engine

#stuff to parse json file
def parseJSON(path):
  g = open(path, 'r')
  for l in g:
      yield eval(l)
#MySQL engine
engine = create_engine('mysql://login:password@localhost:1234/MyDB', echo=False)
#empty df just to have it
df = {}

for l in parseJSON("MyFile.json"):
    df = pd.DataFrame.from_dict(l, orient='index')
    df.to_sql(name='MyTable', con=engine, if_exists = 'append', index=False)

And I get a error:

OperationalError: (_mysql_exceptions.OperationalError) (1054, "Unknown column '0' in 'field list'")

Any ideas what I am missing? Or is there a way to get around this stuff?

UPD. I see that dataframe has an unnamed column with value 0 each time I create the dataframe in inner loop.

Here is some info about DF:

df
Out[155]: 
                                                                0
reviewerID                                         A1C2VKKDCP5H97
asin                                                   0007327064
reviewerName                                        Donna Polston
helpful                                                    [0, 0]
unixReviewTime                                         1392768000
reviewText      love Oddie ,One of my favorite books are the O...
overall                                                         5
reviewTime                                            02 19, 2014
summary                                                       Wow

print(df.columns)
RangeIndex(start=0, stop=1, step=1)

You currently have a frame with one column named 0 with your intended column names as the index of your frame. Perhaps you can try

df = pd.DataFrame.from_dict(l)

NOTE: I think you would have much better performance if you could build up a dict (or some other structure), convert all rows to a df then push to mysql. This one row at a time might be too slow

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM