简体   繁体   English

如何自动化 Google Drive Docs OCR 工具?

[英]How to automate Google Drive Docs OCR facility?

I have using Google Drive and its Open with Google Docs facility to convert them into OCR word file (.docx).我使用 Google Drive 及其 Open with Google Docs 工具将它们转换为 OCR word 文件 (.docx)。 Because the word file preserves the formatting also.因为 word 文件也保留了格式。 I have many images and upload them to Drive and convert them into editable one by one because PDF conversion does not work.我有很多图像并将它们上传到 Drive 并将它们一一转换为可编辑的,因为 PDF 转换不起作用。

In this time I want to wait patiently to finish one conversion process.这时候我想耐心等待完成一个转换过程。 After that I start the next conversion, it is time consuming.之后我开始下一次转换,很费时间。

I used Google OCR API.我使用了 Google OCR API。 But it does not preserve the formatting such as bold, alignment, etc.但它不保留粗体、对齐等格式。

So, is there any way to automate this process using REST API?那么,有没有办法使用 REST API 自动化这个过程?

UPDATE更新

  1. Uploaded images to the Google Drive上传的图片到 Google Drive 关联

  2. The Right click context menu of an image in Google Drive Google Drive 中图像的右键单击上下文菜单关联

  3. Google Docs in the context menu of "Open with" “打开方式”上下文菜单中的 Google 文档关联

  4. After the conversion process the OCR(Auto language detected)转换过程后 OCR(检测到自动语言) 关联

  5. Finally the Google document and the image最后是 Google 文档和图像关联

I tried the googleapis on GitHub and I selected the drive sample list.js code.我在 GitHub 上尝试了googleapis ,并选择了驱动器示例list.js代码。

My Code我的代码

'use strict';

const {google} = require('googleapis');
const sampleClient = require('../sampleclient');

const drive = google.drive({
  version: 'v3',
  auth: sampleClient.oAuth2Client,
});

async function runSample(query) {
  const params = {pageSize: 3};
  params.q = query;
  const res = await drive.files.list(params);
  console.log(res.data);
  return res.data;
}

if (module === require.main) {
  const scopes = ['https://www.googleapis.com/auth/drive.metadata.readonly'];
  sampleClient
    .authenticate(scopes)
    .then(runSample)
    .catch(console.error);
}

module.exports = {
  runSample,
  client: sampleClient.oAuth2Client,
};

How about this modification?这个改装怎么样?

From your sample script, it was found that you are using googleapis .从您的示例脚本中,发现您正在使用googleapis So in this modification, I also used googleapis .所以在这次修改中,我也使用了googleapis The image files in Drive are converted to Google Document with OCR by files.copy method in Drive API. Drive 中的图像文件通过 Drive API 中的files.copy方法转换为带有 OCR 的 Google 文档。 The following modification supposes the following points.以下修改假设以下几点。

  1. You are using googleapis in Node.js.您正在 Node.js 中使用googleapis
  2. When you run your script, you have already retrieved file list by Drive API.运行脚本时,您已经通过 Drive API 检索了文件列表。
    • This indicates that drive in your script can be also used for the files.copy method.这表明您脚本中的drive也可用于files.copy方法。

Notes :注意事项:

  • If you have not used Drive API yet, please check the quickstart .如果您尚未使用 Drive API,请查看快速入门 (version 3). (版本 3)。

Confirmation point:确认点:

Before you run the script, please confirm the following points.在运行脚本之前,请确认以下几点。

  • In order to use the files.copy method, please include https://www.googleapis.com/auth/drive to the scopes in if statement in list.js .为了使用files.copy方法,请将https://www.googleapis.com/auth/drive包含到list.jsif语句的范围中。

Modified script 1 (to convert Google Docs with OCR by giving files() id:修改脚本 1(通过提供 files() id 使用 OCR 转换 Google 文档:

In this modification, runSample() was modified.在此修改中,修改了runSample()

function runSample()
{
    // Please set the file(s) IDs of sample images in Google Drive.
    const files = [
        "### fileId1 ###",
        "### fileId2 ###",
        "### fileId3 ###", , ,
    ];

    // takes each file and convert them to Google Docs format
    files.forEach((id) =>
    {
        const params = {
            fileId: id,
            resource:
            {
                mimeType: 'application/vnd.google-apps.document',
                parents: ['### folderId ###'], // If you want to put the converted files in a specific folder, please use this.
            },
            fields: 'id',
        };

        // Convert after processes here
        // Here we copy the IDs 
        drive.files.copy(params, (err, res) =>
        {
            if (err)
            {
                console.error(err);
                return;
            }
            console.log(res.data.id);
        });
    });
}

Note:笔记:

  • Your files(images) are converted to Google Document by above script, and it seems that the result (Google document) is same as your sample (in your question).您的文件(图像)通过上述脚本转换为 Google 文档,结果(Google 文档)似乎与您的示例(在您的问题中)相同。 But I'm not sure whether this is the quality which you want, please apologize.但我不确定这是否是您想要的质量,请见谅。

References:参考:

Modified script 2 (to convert Google Docs with OCR by single folder and selects only images:修改脚本 2(通过单个文件夹使用 OCR 转换 Google Docs 并仅选择图像:

  • You want to convert the files(images) to Google Document by retrieving them from a specific folder.您想通过从特定文件夹中检索文件(图像)来将它们转换为 Google 文档。
  • You want to retrieve files of image/png , image/jpeg and image/tiff .您想检索image/pngimage/jpegimage/tiff

Sample code syntax:示例代码语法:

const folderId = "### folderId ###"; // Please set the folder ID including the images.
drive.files.list(
{
    pageSize: 1000,
    q: `'${folderId}' in parents and (mimeType='image/png' or mimeType='image/jpeg' or mimeType='image/tiff')`,
    fields: 'files(id)',
}, (err, res) =>
{
    if (err)
    {
        console.error(err);
        return;
    }
    const files = res.data.files;
    files.forEach((file) =>
    {
        console.log(file.id);

        // Please put above script of the files.forEach method by modifying ``id`` to ``file.id``.

    });
});

In this next modification, entire runSample() was modified.在下一个修改中,整个runSample()被修改了。

function runSample()
{
    // Put the folder ID including files you want to convert.
    const folderId = "### folderId ###";

    // Retrieve file list.
    drive.files.list(
    {
        pageSize: 1000,
        q: `'${folderId}' in parents and (mimeType='image/png' or mimeType='image/jpeg' or mimeType='image/tiff')`,
        fields: 'files(id)',
    }, (err, res) =>
    {
        if (err)
        {
            console.error(err);
            return;
        }
        const files = res.data.files;

        // Retrieve each file from the retrieved file list.
        files.forEach((file) =>
        {
            const params = {
                fileId: file.id,
                resource:
                {
                    mimeType: 'application/vnd.google-apps.document',
                    parents: ['### folderId ###'],
                },
                fields: 'id',
            };

            // Convert a file
            drive.files.copy(params, (err, res) =>
            {
                if (err)
                {
                    console.error(err);
                    return;
                }
                console.log(res.data.id);
            });
        });
    });
}

References:参考:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 node.js {name} 更新谷歌驱动器文档中的特定文本 - How to update specific text in google drive docs with node.js {name} 使用 Google Drive API v3 将 .docx 或 .doc 文件转换为 google docs 文件 - Using Google Drive API v3 to convert .docx or .doc file to google docs file 如何在 google docs api 中自定义页边距? - How to custom page margin in google docs api? nodejs - 尝试使用 Google Drive API 导出链接下载文档修订时出现 401 错误 - nodejs - Getting 401 error trying to download docs revisions with Google Drive API export links 检索Google文档中的更改以计算团队合作中每个团队成员的贡献 - Retrieving changes in Google Docs to calculate contributions of every team member in team drive 如何将文件上传到Google驱动器? - How to upload files into google drive? 如何在谷歌云中自动化 NodeJS 脚本 - How to automate a NodeJS script in google cloud 如何在 Google Cloud Functions 中使用 NodeJS 链接 writeFile() 和 OCR? - How to chain writeFile() and OCR with NodeJS in Google Cloud Functions? 如何使用 google docs api 将文本文件转换为 google doc - how to convert a text file into google doc using google docs api Google Drive API - 如何提取 drive.file.list 的结果? - Google Drive API - How to Extract the Results of drive.file.list?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM