简体   繁体   中英

Azure Form Recognizer mainline support for Office documents

I have been using the 2022/06/30-preview version of the API to OCR-ize docx and powerpoint documents. Now that the API has been stabilized and has moved to 2022-08-31 , I have updated my code to use this stable version (juste a version update of the sdk client), but the same documents are now rejected, with an error InvalidContent , "The file is corrupted or format is unsupported. Refer to documentation for the list of supported formats." .

Has support for Office documents been dropped or is there some settings to add? From the changelog I don't seem to see any mention that support has been dropped between the last preview version and the stable one.

I'm using the node.js SDK. I have checked that the same docx document, using the same exact code, is accepted using the @azure/ai-form-recognizer@4.0.0-beta.5 SDK client, but not the latest and stable @azure/ai-form-recognizer@4.0.0 version. The code I'm using is almost exactly the example code in the quickstart , only the urls change.

  • Well according to this MSDOC they have dropped support for Microsoft office files for all SDK.

  • So, you have two options the form recognizer does provide support but for Microsoft office files through RestAPi . So, you can either make http calls or you can convert the files to pdf and then use conventional SDK for further processing.

  • The conversion is done using docx-pdf npm package. Here I have a hjh.docx which I am converting to pdfuploader.pdf and then processing it.

const  fs = require("fs");
const { AzureKeyCredential, DocumentAnalysisClient } = require("@azure/ai-form-recognizer");

const key= "";
const endpoint = "";

async  function  main() {
    //convertion logic 
    var  docxConverter = require('docx-pdf');

    // form recognizer logic

    const  client = new  DocumentAnalysisClient(endpoint, new  AzureKeyCredential(key));
    const  readStream = fs.createReadStream("<Path>");
    const  poller = await  client.beginAnalyzeDocument("prebuilt-document", readStream,{
        onProgress: ({ status }) => {
            console.log(`status: ${status}`);
    const  e = await  poller.pollUntilDone();
main().catch((error) => {
    console.error("An error occurred:", error);

@azure/ai-form-recognizer output: 在此处输入图像描述

@azure/ai-form-recognizer@4.0.0-beta.5 output:


The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM