简体   繁体   中英

Azure Data Factory HTTP connector to parse a webpage

A bit new to using Azure for ETL and Machine Learning.

I want to parse a webpage such as here and convert it into a labeled catalog of structured data, to which I can apply ML transforms.

I was reading up the Azure documentation on HTTP Connector but I am unclear on a stepwise process to do so using either the Azure Data factory UI or scripts.

Can Azure Data Factory be used to such a parsing task and if yes, is there clear documentation on how the Azure Data Factory UI can be used to do so?

I think at this point you should be looking at the v2 of ADF.

Regarding your use case I don't see how the Http Connector would resolve the "parsing" of the webpage. That connector can help you take the content of the page (by doing a GET request) and move it to somewhere for storing, for example a blob. And then you can trigger some sort of custom activity with code that has the logic to convert the html of the page to the catalog of structured data that you'll like. Then you can feed that to another pipeline that has the ML transforms that you require.

Basically you will have to implement the logic of the parsing by yourself, IMHO ADF can help you with the orchestrating and the movement of data but not with the "parsing" side of things.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM