简体   繁体   English

如何使用 Pentaho Data Integration - Spoon (Kettle) 从 HTTPS URL 下载 CSV 到文件?

[英]How to download a CSV from a HTTPS URL to file using Pentaho Data Integration - Spoon (Kettle)?

When googling this question, it seems to have been asked, and partially (and poorly) answered a number of times, mostly for older versions.在谷歌搜索这个问题时,似乎有人问过这个问题,并且部分(和糟糕地)回答了很多次,主要是针对旧版本。

Question: How can I download a CSV to a local file, with the below constraints?问题:如何在具有以下限制的情况下将 CSV 下载到本地文件? I'm designing in Spoon.我正在 Spoon 中进行设计。

URL : Will always be the same. URL :将始终相同。 https://example.com/data/my.csv . https://example.com/data/my.csv The website prepares the csv and provides it back to the web client as a file download after about 4-5 seconds.该网站准备 csv,并在大约 4-5 秒后将其作为文件下载提供回 Web 客户端。 In a browser this means it is downloaded as a .csv, and not displayed.在浏览器中,这意味着它以 .csv 格式下载,而不显示。

Authentication : The website does not require authentication for access.身份验证:该网站不需要身份验证即可访问。 The data isn't sensitive.数据不敏感。

Local file path : The downloaded CSV will overwrite the existing csv.本地文件路径:下载的 CSV 将覆盖现有的 csv。 eg: d:\\data\\my.csv .例如: d:\\data\\my.csv 。 Ie, I can set this on a timer and have it download the newest csv every hour or so.即,我可以在计时器上设置它,并让它每小时左右下载最新的 csv。

Proxy : It is quite likely I will need to traverse a network proxy.代理:我很可能需要遍历网络代理。 eg badproxy.mynetwork.internal:8080 and that proxy requires a username and password.例如 badproxy.mynetwork.internal:8080 并且该代理需要用户名和密码。 It's far better if I can set this password in a single location so any future things created can reference it.如果我可以在一个位置设置这个密码,那么将来创建的任何东西都可以引用它,那就更好了。 Not really sure on how to approach this either.也不太确定如何处理这个问题。

The rest of my process focuses on addressing the content of the csv, and already works fine.我的其余过程专注于解决 csv 的内容,并且已经运行良好。

The processes I've found on google show using the Http Client component, though it's not particularly straightforward how this translates into a file being saved locally into a known location.我在 google 上找到的进程显示使用 Http Client 组件,尽管这不是特别简单,如何将其转换为本地保存到已知位置的文件。

Thanks for any pointers.感谢您的指点。

PDI v9.0.0.0-423 PDI v9.0.0.0-423

The HTTP client step needs to be triggered.需要触发 HTTP 客户端步骤。 Use a Row generator step generating eg 1 empty row and link that with a hop to the HTTP client step.使用行生成器步骤生成例如 1 个空行并将其与一个跃点链接到 HTTP 客户端步骤。 for your solution , try this: Data Grid -->HTTP Client-->CSV File Input->Text file output(extension with csv)对于您的解决方案,试试这个:数据网格-->HTTP 客户端-->CSV 文件输入->文本文件输出(用 csv 扩展)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Pentaho Kettle(Spoon)将数据从“列,行,值”格式转换为仅CSV格式的值的行 - How to transform data from “Column,Row,Value” format to rows of the Values only in CSV format using Pentaho Kettle (Spoon) 无法在Pentaho Data Integration(Kettle)中运行spoon.bat或任何其他批处理文件 - Not able to run spoon.bat or any other batch file in Pentaho Data Integration (Kettle) 使用Pentaho Kettle,如何在输入表中进行中间步骤读取的情况下,从CSV文件加载输出表中的数据? - Using Pentaho Kettle, how to load data in output table from CSV file with intermediary step reading in an input table? 如何使用pentaho数据集成(勺)在午夜运行作业? - How to run a job in midnight using pentaho data integration (spoon)? 使用Pentaho Kettle / Spoon / PDI在文本文件中查找单词 - Find a word inside text file using Pentaho Kettle/Spoon/PDI Pentaho水壶汤匙-如何创建31列的浮点数据列表以检查其是否为NULL / NOT NULL - Pentaho Kettle spoon - How to create a list of 31 columns of floating type data to check it is NULL/NOT NULL 在Pentaho Kettle中加载页面后下载文件(数据集成) - Downloading a file after a page load in Pentaho Kettle (Data integration) 对每个Pentaho数据集成作业(水壶)使用单独的日志文件 - Use a separate log file for each Pentaho Data Integration Job (kettle) 如何通过Pentaho勺子中的xls文件输入获取数据来使用其他条件? - How can I use an additional condition by getting data from xls-file input in Pentaho spoon? Pentaho勺子从Excel文件转换 - Pentaho spoon transformation from excel file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM