[英]How to download a CSV from a HTTPS URL to file using Pentaho Data Integration - Spoon (Kettle)?
When googling this question, it seems to have been asked, and partially (and poorly) answered a number of times, mostly for older versions.在谷歌搜索这个问题时,似乎有人问过这个问题,并且部分(和糟糕地)回答了很多次,主要是针对旧版本。
Question: How can I download a CSV to a local file, with the below constraints?问题:如何在具有以下限制的情况下将 CSV 下载到本地文件? I'm designing in Spoon.
我正在 Spoon 中进行设计。
URL : Will always be the same. URL :将始终相同。 https://example.com/data/my.csv .
https://example.com/data/my.csv 。 The website prepares the csv and provides it back to the web client as a file download after about 4-5 seconds.
该网站准备 csv,并在大约 4-5 秒后将其作为文件下载提供回 Web 客户端。 In a browser this means it is downloaded as a .csv, and not displayed.
在浏览器中,这意味着它以 .csv 格式下载,而不显示。
Authentication : The website does not require authentication for access.身份验证:该网站不需要身份验证即可访问。 The data isn't sensitive.
数据不敏感。
Local file path : The downloaded CSV will overwrite the existing csv.本地文件路径:下载的 CSV 将覆盖现有的 csv。 eg: d:\\data\\my.csv .
例如: d:\\data\\my.csv 。 Ie, I can set this on a timer and have it download the newest csv every hour or so.
即,我可以在计时器上设置它,并让它每小时左右下载最新的 csv。
Proxy : It is quite likely I will need to traverse a network proxy.代理:我很可能需要遍历网络代理。 eg badproxy.mynetwork.internal:8080 and that proxy requires a username and password.
例如 badproxy.mynetwork.internal:8080 并且该代理需要用户名和密码。 It's far better if I can set this password in a single location so any future things created can reference it.
如果我可以在一个位置设置这个密码,那么将来创建的任何东西都可以引用它,那就更好了。 Not really sure on how to approach this either.
也不太确定如何处理这个问题。
The rest of my process focuses on addressing the content of the csv, and already works fine.我的其余过程专注于解决 csv 的内容,并且已经运行良好。
The processes I've found on google show using the Http Client component, though it's not particularly straightforward how this translates into a file being saved locally into a known location.我在 google 上找到的进程显示使用 Http Client 组件,尽管这不是特别简单,如何将其转换为本地保存到已知位置的文件。
Thanks for any pointers.感谢您的指点。
PDI v9.0.0.0-423 PDI v9.0.0.0-423
The HTTP client step needs to be triggered.需要触发 HTTP 客户端步骤。 Use a Row generator step generating eg 1 empty row and link that with a hop to the HTTP client step.
使用行生成器步骤生成例如 1 个空行并将其与一个跃点链接到 HTTP 客户端步骤。 for your solution , try this: Data Grid -->HTTP Client-->CSV File Input->Text file output(extension with csv)
对于您的解决方案,试试这个:数据网格-->HTTP 客户端-->CSV 文件输入->文本文件输出(用 csv 扩展)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.