简体   繁体   中英

Ordering of rows in Pandas to_sql

I have a Pandas Dataframe which is ordered.

             a0               b0  c0     d0 370025442 370020440 370020436  \
1    31/08/2014  First Yorkshire  53  05:10         0    0.8333    1.2167   
2    31/08/2014  First Yorkshire  53  07:10         0      0.85      1.15   
3    31/08/2014  First Yorkshire  53  07:40         0    0.5167    0.7833   
4    31/08/2014  First Yorkshire  53  08:10         0       0.7         1   
5    31/08/2014  First Yorkshire  53  08:40       NaN       NaN       NaN   
6    31/08/2014  First Yorkshire  53  09:00         0       0.5    0.7667   
7    31/08/2014  First Yorkshire  53  09:20         0    0.5833         1   
8    31/08/2014  First Yorkshire  53  09:40         0       0.4       0.7   
9    31/08/2014  First Yorkshire  53  10:20         0    0.5333    1.0333   
10   31/08/2014  First Yorkshire  53  10:40         0    0.4833         1   
11   31/08/2014  First Yorkshire  53  11:00         0    0.3667       0.7   
12   31/08/2014  First Yorkshire  53  11:20         0    0.5333      1.15   
13   31/08/2014  First Yorkshire  53  11:40         0    0.3333    0.7667   
14   31/08/2014  First Yorkshire  53  12:00         0    1.0167       1.5   
15   31/08/2014  First Yorkshire  53  12:40         0      0.75    1.0333   
..          ...              ...  ..    ...       ...       ...       ...   
737  25/10/2014  First Yorkshire  53  21:40         0    1.0167       1.3   
738  25/10/2014  First Yorkshire  53  22:40         0    0.5667         1

However, when I convert this to SQL, the ordering is altered (row 13 onwards) and becomes:

             a0               b0  c0     d0 370025442 370020440 370020436  \
0    31/08/2014  First Yorkshire  53  05:10         0    0.8333    1.2167   
1    31/08/2014  First Yorkshire  53  07:10         0      0.85      1.15   
2    31/08/2014  First Yorkshire  53  07:40         0    0.5167    0.7833   
3    31/08/2014  First Yorkshire  53  08:10         0       0.7         1   
4    31/08/2014  First Yorkshire  53  08:40      None      None      None   
5    31/08/2014  First Yorkshire  53  09:00         0       0.5    0.7667   
6    31/08/2014  First Yorkshire  53  09:20         0    0.5833         1   
7    31/08/2014  First Yorkshire  53  09:40         0       0.4       0.7   
8    31/08/2014  First Yorkshire  53  10:20         0    0.5333    1.0333   
9    31/08/2014  First Yorkshire  53  10:40         0    0.4833         1   
10   31/08/2014  First Yorkshire  53  11:00         0    0.3667       0.7   
11   31/08/2014  First Yorkshire  53  11:20         0    0.5333      1.15   
12   31/08/2014  First Yorkshire  53  14:00         0    0.4833    1.0167   
13   31/08/2014  First Yorkshire  53  16:20         0    0.6833      1.15   
14   31/08/2014  First Yorkshire  53  23:10      None      None      None    
..          ...              ...  ..    ...       ...       ...       ...    
736  25/10/2014  First Yorkshire  53  21:40         0    1.0167       1.3   
737  25/10/2014  First Yorkshire  53  22:40         0    0.5667         1

The data is correct, it's just the ordering of the rows which has been altered (this is confirmed looking at the SQL table from within SQL Server Management Studio). I checked the input table both before and after the operation and it remains unaltered, so the ordering issue must be when it is converted to SQL.

The code used to create the SQL table is:

engine = sqlalchemy.create_engine("mssql+pyodbc://*server*?driver=SQL+Server+Native+Client+10.0?trusted_connection=yes")
conn = engine.connect()
art_array.to_sql(theartsql, engine, if_exists="replace", index=False)

(where the server is actually specified)

What might be causing this and how might I resolve it? Any help would be really appreciated...

edit: I should mention that the versions I am using are:

Python version: 2.7.8

Pandas version: 0.15.1

SQLalchemy version: 1.0.12

These are required to be maintained to be compatible with other software.

That is Normal . Sql tables do not maintain row order . You need to "order by" to get the correct order. You could include a row id (or index) prior to moving data to SQL. So, then you can "order by" in Sql.

Try something like this:

df
      a
0  1.00
1  2.00
2  0.67
3  1.34

print df.reset_index().to_sql(xxxx)
   index     a
0      0  1.00
1      1  2.00
2      2  0.67
3      3  1.34

Then in SQL, you can "order by" index.. "order by" syntax can vary depending on SQL database.

For anyone who's still looking into this. I found that using the option method="multi" will be able to preserve the order. By default, the method is None, which "uses standard SQL INSERT clause (one per row)". By specifying the multi method, it "passes multiple values in a single INSERT clause".

df.tosql(method="multi")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM