简体   繁体   中英

pandas sqlite3 operationalerror: table has no column named

I am trying to collect data using chromedriver

I am using the url ' http://web.mta.info/developers/turnstile.html ' to get my data, extract the file link and then I am putting it in two tables based on the date of the data this is the code I am trying to execute:

record_cnt = 0  
for link in data_list_post:
    data = pd.read_table(link, sep=',')
    print('%s:%s rows %s columns' % (link[-10:-4],data.shape[0], data.shape[1])) 
    record_cnt += data.shape[0]
    data.to_sql(name='post', con=conPost, flavor='sqlite', if_exists='append')

Traceback:

---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
<ipython-input-9-6f5adea38bf9> in <module>()
      3     data = pd.read_table(link, sep=',')
      4     record_cnt += data.shape[0]
----> 5     data.to_sql(name='post', con=conPost, flavor='sqlite', if_exists='append')

/Users/xx/anaconda/lib/python3.4/site-packages/pandas/core/generic.py in to_sql(self, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
   1199         sql.to_sql(self, name, con, flavor=flavor, schema=schema,
   1200                    if_exists=if_exists, index=index, index_label=index_label,
-> 1201                    chunksize=chunksize, dtype=dtype)
   1202 
   1203     def to_pickle(self, path):

/Users/xx/anaconda/lib/python3.4/site-packages/pandas/io/sql.py in to_sql(frame, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
    468     pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index,
    469                       index_label=index_label, schema=schema,
--> 470                       chunksize=chunksize, dtype=dtype)
    471 
    472 

/Users/xx/anaconda/lib/python3.4/site-packages/pandas/io/sql.py in to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype)
   1501                             dtype=dtype)
   1502         table.create()
-> 1503         table.insert(chunksize)
   1504 
   1505     def has_table(self, name, schema=None):

/Users/xx/anaconda/lib/python3.4/site-packages/pandas/io/sql.py in insert(self, chunksize)
    662 
    663                 chunk_iter = zip(*[arr[start_i:end_i] for arr in data_list])
--> 664                 self._execute_insert(conn, keys, chunk_iter)
    665 
    666     def _query_iterator(self, result, chunksize, columns, coerce_float=True,

/Users/xx/anaconda/lib/python3.4/site-packages/pandas/io/sql.py in _execute_insert(self, conn, keys, data_iter)
   1289     def _execute_insert(self, conn, keys, data_iter):
   1290         data_list = list(data_iter)
-> 1291         conn.executemany(self.insert_statement(), data_list)
   1292 
   1293     def _create_table_setup(self):

OperationalError: table post has no column named A002

your problem is that you want to pull the table from each link at that page, and compile them into a single database table... but the tables in your links are different. Links towards the top of the list like

http://web.mta.info/developers/data/nyct/turnstile/turnstile_160312.txt

have as their first/header row:

C/A,UNIT,SCP,STATION,LINENAME,DIVISION,DATE,TIME,DESC,ENTRIES,EXITS

vs links towards the bottom of the page like

http://web.mta.info/developers/data/nyct/turnstile/turnstile_121222.txt

have very different looking first rows, like:

A002,R051,02-00-00,12-15-12,03:00:00,REGULAR,003911852,001349428,12-15-12,07:00:00,REGULAR,003911868,001349432,12-15-12,11:00:00,REGULAR,003911930,001349538,12-15-12,15:00:00,REGULAR,003912146,001349600,12-15-

At first it looked like the second page above is just missing a header row, but its top row (& all rows) don't look like the data rows from the first group either. Can you decipher what all the fields should be called for those rows in the second group?

Basically there's some set of links (generally lower down the list) that you're gonna have to treat differently than the top ones because the tables are different.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM