简体   繁体   English

Snowflake python连接器不适用于AWS Lambda中的较大数据集

[英]Snowflake python connector not working on larger data set in AWS Lambda

I'm using Snowflakes python connector to try to retrieve a set of data from our data warehouse for processing. 我正在使用Snowflakes python连接器尝试从我们的数据仓库中检索一组数据进行处理。 This job is executing within a AWS lambda function and has trouble when the rows being returning back is ~20 or so. 该作业正在AWS lambda函数中执行,并且在返回的行数约为20左右时会出现问题。 When I set a limit 10 or limit 20 I'm able to get the data set back. 当我将limit 10设置为limit 10limit 20我可以恢复设置的数据。 If I leave limit off then it struggles trying to get the result set of only 65 rows. 如果我没有limit ,那么尝试仅获得65行的结果集将很困难。

The memory and timeout values in my lambda are already at the max and the data set exported to CSV was only 300KB. 我的lambda中的内存和超时值已经达到最大值,并且导出到CSV的数据集仅为300KB。 Running locally this query returns just fine so it may have something to do with memory size but the data being returning is not that large really. 在本地运行此查询返回的结果很好,因此它可能与内存大小有关,但是返回的数据实际上并不是那么大。

connector = snowflake.connector.connect(
    account=os.environ['SNOWFLAKE_ACCOUNT'],
    user=os.environ['SNOWFLAKE_USER'],
    password=os.environ['SNOWFLAKE_PASSWORD'],
    role="MY_ROLE",
    ocsp_response_cache_filename="/tmp/.cache/snowflake/"
                                 "ocsp_response_cache",
)
print("Connected to snowflake")
cursor = connector.cursor(DictCursor)
cursor.execute('USE DATA.INFORMATION_SCHEMA')

query = "SELECT * FROM TABLE WHERE X=Y"  # FAKE QUERY

print("Execute query: \n\t{0}".format(query))
cursor.execute(query)
print("Execute query done!")
posts = []
processed = 0
for rec in cursor:
    processed += 1
    print("Processed count: {}".format(processed))
    posts.append(rec)

# These attempts also didn't work. 
# posts = cursor.fetchmany(size=cursor.rowcount)
# posts = cursor.fetchall()

cursor.close()

processed integer value gets up to 17 records but then halts. processed整数值最多可获取17条记录,但随后会暂停。 My logs are outputting a lot of stuff about chunks not being ready to consume and eventually the lambda just times out 我的日志输出了很多有关未准备好使用的块的信息,最终lambda超时了

[1531919679073] [DEBUG] 2018-07-18T13:14:39.72Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Chunk Downloader in memory
[1531919679073] Execute query done!
[1531919679073] [DEBUG] 2018-07-18T13:14:39.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 chunk index: 0, chunk_count: 2
[1531919679073] [DEBUG] 2018-07-18T13:14:39.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 next_chunk_to_consume=1, next_chunk_to_download=3, total_chunks=2
[1531919679073] [DEBUG] 2018-07-18T13:14:39.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 waiting for chunk 1/2 in 1/10 download attempt
[1531919679073] [DEBUG] 2018-07-18T13:14:39.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 chunk 1/2 is NOT ready to consume in 10/3600(s)
[1531919679073] [DEBUG] 2018-07-18T13:14:39.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 downloading chunk 1/2
[1531919679074] [DEBUG] 2018-07-18T13:14:39.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 use chunk headers from result
[1531919679074] [DEBUG] 2018-07-18T13:14:39.74Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 started getting the result set 1: https://sfc-va-ds1-customer-stage.s3.amazo
naws.com/fwoi-s-vass0007/results/7b9cf772-a061-47ab-8e9f-43dbfcd923c9_0/main/data_0_0_0?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-e
ncoding=gzip&AWSAccessKeyId=AKIAJKHCJ73YL7MD6ZRA&Expires=1531941279&Signature=VvGOkLNvE%2FHVMaUXoeQMn6cFUOY%3D
[1531919679074] [DEBUG] 2018-07-18T13:14:39.74Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Active requests sessions: 1, idle: 0
[1531919679074] [DEBUG] 2018-07-18T13:14:39.74Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 remaining request timeout: 3600, retry cnt: 1
[1531919679074] [DEBUG] 2018-07-18T13:14:39.74Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 socket timeout: 60
[1531919679075] [INFO] 2018-07-18T13:14:39.75Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Starting new HTTPS connection (1): sfc-va-ds1-customer-stage.s3.amazonaws.com
[1531919679078] [DEBUG] 2018-07-18T13:14:39.75Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 downloading chunk 2/2
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 use chunk headers from result
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 started getting the result set 2: https://sfc-va-ds1-customer-stage.s3.amazo
naws.com/fwoi-s-vass0007/results/7b9cf772-a061-47ab-8e9f-43dbfcd923c9_0/main/data_0_0_1?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-e
ncoding=gzip&AWSAccessKeyId=AKIAJKHCJ73YL7MD6ZRA&Expires=1531941279&Signature=F5ix8FcsLO1dM8sWsZXZYx4uHM8%3D
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Converted retries value: 1 -> Retry(total=1, connect=None, read=None, redire
ct=None)
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Converted retries value: 1 -> Retry(total=1, connect=None, read=None, redire
ct=None)
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Active requests sessions: 2, idle: 0
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 remaining request timeout: 3600, retry cnt: 1
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 socket timeout: 60
[1531919679078] [INFO] 2018-07-18T13:14:39.77Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Starting new HTTPS connection (1): sfc-va-ds1-customer-stage.s3.amazonaws.com
[1531919681581] [DEBUG] 2018-07-18T13:14:41.580Z 26284dc8-8a8c-11e8-95ac-3ff42bd28642 chunk 1/2 is NOT ready to consume in 160/3600(s)
[1531919689074] [DEBUG] 2018-07-18T13:14:49.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 chunk 1/2 is NOT ready to consume in 20/3600(s)
[1531919691581] [DEBUG] 2018-07-18T13:14:51.581Z 26284dc8-8a8c-11e8-95ac-3ff42bd28642 chunk 1/2 is NOT ready to consume in 170/3600(s)
[1531919699074] [DEBUG] 2018-07-18T13:14:59.74Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 chunk 1/2 is NOT ready to consume in 30/3600(s)
[1531919701581] [DEBUG] 2018-07-18T13:15:01.581Z 26284dc8-8a8c-11e8-95ac-3ff42bd28642 chunk 1/2 is NOT ready to consume in 180/3600(s)
[1531919709074] [DEBUG] 2018-07-18T13:15:09.74Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 chunk 1/2 is NOT ready to consume in 40/3600(s)
[1531919711582] [DEBUG] 2018-07-18T13:15:11.581Z 26284dc8-8a8c-11e8-95ac-3ff42bd28642 chunk 1/2 is NOT ready to consume in 190/3600(s)
[1531919712739] [DEBUG] 2018-07-18T13:15:12.738Z 26284dc8-8a8c-11e8-95ac-3ff42bd28642 Incremented Retry for (url='/fwoi-s-vass0007/results/7b9cf772-a061-47ab-8e9
f-43dbfcd923c9_0/main/data_0_0_0?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=AKIAJKHCJ73YL7MD6ZRA&Expire
s=1531941131&Signature=mW6nXerwYHhnfwfPdRF0So1tpIQ%3D'): Retry(total=0, connect=None, read=None, redirect=None)
[1531919719075] [DEBUG] 2018-07-18T13:15:19.75Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 chunk 1/2 is NOT ready to consume in 50/3600(s)

From the log it seems that python connector keeps retrying to download result from s3. 从日志看来,python连接器似乎一直在尝试从s3下载结果。 It is expected behavior if your query generate large amount of data. 如果您的查询生成大量数据,这是预期的行为。 I would suggest try to make sure that your lambda environment has the access to s3 bucket. 我建议您尝试确保您的Lambda环境可以访问s3存储桶。 A simple curl command should verify it. 一个简单的curl命令应该验证它。

curl -v https://sfc-va-ds1-customer-stage.s3.amazonaws.com

If you can get something http code back (like 403), then it means you have the connection. 如果您可以获取一些HTTP代码(例如403),则表明您已建立连接。 Otherwise, if it hangs, then something is not configured properly in your environment. 否则,如果挂起,则说明您的环境中的配置不正确。

I encountered similar issue, but with Snowflake JDBC connecter. 我遇到了类似的问题,但使用Snowflake JDBC连接器。

Select * from table : Fetches first chunk of data (600 records) and then I get "Connection timeout" while fetching the next chunk of data 从表中选择* :提取第一个数据块(600条记录),然后在获取下一个数据块时出现“连接超时”

If I do, Select * from table limit 1200 , it works fine with out any timeouts 如果我这样做,请从表限制1200中选择* ,它可以正常工作并且没有任何超时

So, broke-down the whole thing into 2 steps.. 因此,将整个过程分解为两个步骤。

  1. rowcount = select count(*) from table rowcount = 从表中选择count(*)
  2. Select * from table limit rowcount 从表限制行数中选择*

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AWS Lambda Snowflake Python 连接器在尝试连接时挂起 - AWS Lambda Snowflake Python Connector hangs attempting to connect 如何在不使用 snowflake-connector-python 的情况下连接到 aws lambda function 中的雪花? - How can I connect to snowflake in aws lambda function without using snowflake-connector-python? Python AWS Lambda 信号:在雪花连接(snowflake.connector)期间中止(核心转储) - Python AWS Lambda signal: aborted (core dumped) during snowflake connection (snowflake.connector) Python Snowflake Connector - 不能使用 set 语句? - Python Snowflake Connector - cannot use set statement? 带密码的雪花连接器 (Python) - Snowflake connector with passcode (Python) Snowflake - Python 的连接器问题 - Snowflake - Connector issue with Python 适用于 Python 问题的雪花连接器 - Snowflake Connector for Python Issue 在雪花 web UI 中工作的查询,但通过 python sqlalchemy 连接器不一致 - Queries working in snowflake web UI but not consistently through the python sqlalchemy connector 雪花查询适用于 cli 但不适用于 python 雪花连接器 - snowflake query works with cli but not python snowflake connector 为 python 安装雪花连接器时出错 - Error installing snowflake connector for python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM