当我尝试使用 Python 将大数据库导入 SQL Azure 时出现问题

Question

I have a pretty weird problem, I am trying to extract with Python a SQL database in Azure.我有一个非常奇怪的问题，我正在尝试使用 Python 提取 Azure 中的 SQL 数据库。 Within this database, there are several tables (I explain this because you gonna see a "for" loop in the code).在这个数据库中，有几个表（我解释这个是因为你会在代码中看到一个“for”循环）。

I can import some tables without problem, others (the ones that take the longest, I suppose it is because size) fail.我可以毫无问题地导入一些表，而其他表（花费时间最长的表，我想是因为大小）失败了。

Not only does it throw an error ( [1] 25847 killed / usr / bin / python3 ), but it directly kicks me out of the console.它不仅会引发错误（ [1] 25847killed /usr/bin/python3 ），而且会直接将我踢出控制台。

Does anyone know why?有谁知道为什么？ Is there an easier way to calculate the size of the database without import the entire database with pd.read_sql ()?有没有更简单的方法来计算数据库的大小而不用 pd.read_sql() 导入整个数据库？

code:代码：

cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
query = "SELECT * FROM INFORMATION_SCHEMA.TABLES"
df = pd.read_sql(query, cnxn)
df


DataConContenido = pd.DataFrame({'Nombre':[], 'TieneCon?':[],'Size':[]})

for tablas in df['TABLE_NAME']:
    cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
    cursor = cnxn.cursor()
    query = "SELECT * FROM " + tablas
    print("vamos con  "+ str(tablas))
    try:
        df = pd.read_sql(query, cnxn)
        size=df.shape
        if size[0] > 0:
            DataConContenido= DataConContenido.append(dict(zip(['Nombre','TieneCon?','Size'],[tablas,True,size])),ignore_index=True)
        else:
            DataConContenido= DataConContenido.append(dict(zip(['Nombre','TieneCon?','Size'],[tablas,False,size])),ignore_index=True)
    except:
        pass

Could it be that the connection drops when it takes so long and that is why the error named above?可能是连接在花费这么长时间时断开，这就是上面提到的错误的原因吗？

Answer 1

I think the process is getting killed in the below line:我认为该过程在以下行中被杀死：

DataConContenido= DataConContenido.append(dict(zip(['Nombre','TieneCon?','Size'],[tablas,True,size])),ignore_index=True)

You could double confirm by adding a print statement just above it.您可以通过在其上方添加打印语句来双重确认。

print("Querying Completed...")

You are getting KILLED mainly because there is a probability that your process crossed some limit in the amount of system resources that you are allowed to use.您被 KILLED 主要是因为您的进程有可能超过了允许使用的系统资源数量的某些限制。 This specific operation to me appears like one.这个特定的操作对我来说似乎是一个。

If possible you could query and append in batches rather than doing in one shot.如果可能的话，您可以批量查询和 append 而不是一次性完成。

当我尝试使用 Python 将大数据库导入 SQL Azure 时出现问题

问题描述

1 个解决方案

解决方案1
0 2021-03-05 10:09:59

当我尝试使用 Python 将大数据库导入 SQL Azure 时出现问题

问题描述

1 个解决方案

解决方案1 0 2021-03-05 10:09:59

解决方案1
0 2021-03-05 10:09:59