简体   繁体   中英

Turn .sql database dump into pandas dataframe

I have a .sql file that contains a database dump. I would prefer to get this file into a pandas dataframe so that I can view the data and manipulate it. Willing to take any solution, but need explicit instructions, I've never worked with a .sql file previously.

The file's structure is as follows:

-- MySQL dump 10.13  Distrib 8.0.11, for Win64 (x86_64)
--
-- Host: localhost    Database: somedatabase
-- ------------------------------------------------------
-- Server version   8.0.11

DROP TABLE IF EXISTS `selected`;
CREATE TABLE `selected` (
  `date` date DEFAULT NULL,
  `weekday` int(1) DEFAULT NULL,
  `monthday` int(4) DEFAULT NULL,
... [more variables]) ENGINE=somengine DEFAULT CHARSET=something COLLATE=something;

LOCK TABLES `selected` WRITE;
INSERT INTO `selected` VALUES (dateval, weekdayval, monthdayval), (dateval, weekdayval, monthdayval), ... (dateval, weekdayval, monthdayval);
INSERT INTO `selected` VALUES (...), (...), ..., (...);
... (more insert statements) ...
-- Dump completed on timestamp

You should use the sqlalchemy library for this: https://docs.sqlalchemy.org/en/13/dialects/mysql.html

Or alternatively you could use this: https://pynative.com/python-mysql-database-connection/

The second option my be easier to load your data to mysql as you could just take your sql file text as the query object and pass it to the connection.

Something like this:

import mysql.connector
connection = mysql.connector.connect(host='localhost',
                                             database='database',
                                             user='user',
                                             password='pw')
query = yourSQLfile    
cursor = connection.cursor()
result = cursor.execute(query)

Once you've loaded your table you create the engine with sqlalchemy to connect pandas to your database and simply use the pandas read_sql() command to load your table to a dataframe object.

Another note is that if you just want to manipulate the data, you could take the values statement from the sql file and use that to populate a dataframe manually if you needed to. Just change the "Values (....),(....),(....)" to mydict = {[....],[....],[....]} and load it to a dataframe. Or you could dump the values statement to excel and delete the parentheses and do text to columns, give it headers and save it, then load it to a dataframe from excel. Or just manipulate it in excel (you could even use a concat formula to recreate the sql values syntax and replace the data in the sql file). It really depends on exactly what your end-goal here is.

Sorry you did not receive a timely answer here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM