使用SQLAlchemy将Pandas数据框转换为Dockerized Postgres

Question

一行摘要： 我想1）启动在docker中运行的Postgres数据库2）使用SQLAlchemy 从容器外部用Pandas数据框填充此PostgreSQL数据库 。

Docker运行良好：

CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                    NAMES
27add831cce5        postgres:10.1-alpine     "docker-entrypoint.s…"   2 weeks ago         Up 2 weeks          5432/tcp                 django-postgres_db_1

我已经找到了将熊猫数据框获取到Postgres并使用SQLAlchemy在Dockerized Postgres中创建表的帖子。 缝合在一起，我得到以下（某种）有效的方法：

import numpy as np
import pandas as pd

from sqlalchemy import create_engine
from sklearn.datasets import load_iris


def get_iris():

    iris = load_iris()

    return pd.DataFrame(data=np.c_[iris['data'], iris['target']],
                        columns=iris['feature_names'] + ['target'])

df = get_iris()

print(df.head(n=5))

engine = create_engine(
    'postgresql://postgres:mysecretpassword@localhost:5432/postgres'.format(
    'django-postgres_db_1'))

df.to_sql('iris', engine)

问题：

q.1 ）以上是否接近首选的方法？

q.2 ）是否可以使用SQLAlchemy在Postgres中创建数据库？ 例如，因此我不必手动添加新的数据库或填充默认的Postgres数据库。

问题：

第1页 ）当我运行可正常运行的create_engine ，出现以下错误：

  File "/home/tmo/projects/toy-pipeline/venv/lib/python3.5/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 683, in do_executemany
    cursor.executemany(statement, parameters)
KeyError: 'sepal length (cm'

但是，如果我再次运行代码，则表明虹膜表已经存在。 如果我手动访问Postgres数据库并执行postgres=# TABLE iris它什么也不会返回。

p.2 ）我在PostgreSQL中运行的Postgres db中有一个表，名为testdb

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 testdb    | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
(4 rows)

但是，如果我尝试在create_engine插入该表，则会收到错误消息：

conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL:  database "testdb" does not exist

（注意postgres如何被testdb取代）：

engine = create_engine(
    'postgresql://postgres:mysecretpassword@localhost:5432/testdb'.format(
    'django-postgres_db_1'))

更新：

因此，我认为我已经弄清楚了问题所在：主机名和地址的错误使用。 我应该提到我正在Ubuntu 16.04上的Azure实例上运行。

以下是有关运行Postgres的容器的一些有用信息：

HOSTNAME=96402054abb3
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/10/bin
PGDATA=/var/lib/postgresql/data
PG_MAJOR=10
PG_VERSION=10.5-1.pgdg90+1

并且在etc/hosts

127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2  96402054abb3

如何正确构造我的连接字符串？ 我试过了：

如此处建议的容器名称：

engine = create_engine(
    'postgresql://postgres:saibot@{}:5432/testdb'.format(
    'c101519547f8e89c3422ca9e1dc68d85ad9f24bd8e049efb37273782540646f0'))

OperationalError: (psycopg2.OperationalError) could not translate host name "96402054abb3" to address: Name or service not known

而且我尝试在没有运气的情况下放入ip， localhost ， HOSTNAME等。

我正在使用此代码段来测试数据库是否连接：

from sqlalchemy import create_engine
from sqlalchemy_utils import database_exists

engine = create_engine(
    'postgresql://postgres:saibot@172.17.0.2/testdb')

database_exists(engine.url)

Answer 1

我通过将容器的主机ip： 172.17.0.2插入连接字符串中来解决此问题：

'postgresql://postgres:mysecretpasswd@172.17.0.2/raw_data'

结合功能解决了我的问题：

def db_create(engine_url, dataframe):
    """
    Check if postgres db exists, if not creates it
    """

    engine = create_engine(engine_url)

    if not database_exists(engine.url):
        print("Database does not exist, creating...")
        create_database(engine.url)

    print("Does it exist now?", database_exists(engine.url))

    if database_exists(engine.url):
        data_type = str(engine.url).rsplit('/', 1)[1]
        print('Populating database with', data_type)
        dataframe.to_sql(data_type, engine)

db_create('postgresql://postgres:mysecretpasswd@172.17.0.2/raw_data')

将使用表raw_data创建一个名为raw_data的数据库，并使用目标Pandas数据框填充该数据库。

使用SQLAlchemy将Pandas数据框转换为Dockerized Postgres

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-09-21 14:28:24

使用SQLAlchemy将Pandas数据框转换为Dockerized Postgres

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-09-21 14:28:24

解决方案1
0 已采纳 2018-09-21 14:28:24