简体   繁体   中英

Extract Text from google doc using google apps script

I have a google doc. https://docs.google.com/document/d/1ryvCCj1NCF12RnQx0IyluJmRpW740icoBLIFOJj2juE/edit?usp=sharing

and I want to extract the red text part from the doc [red colour is only for reference]. red part is table and list on 2nd and 3rd page on doc file.

I write the code for table and I easily extract it and paste it in new doc file. But unable to extract the list items from 3rd page.

function test(){
  var sourcedoc = DocumentApp.openById('id');
  var sourcebody = sourcedoc.getBody();
  var tables = sourcebody.getTables();
  var table = tables[0].copy();
  var destdoc = DocumentApp.openById('id');
  var destbody = destdoc.getBody();
  var x = destbody.appendTable(table);
}
  • You want to copy the 1st table and the 2nd and 3rd lists in the source Google Document to another Google Document.
    • In your situation, you want to retrieve 2 lists after the 1st table. And you want to retrieve the paragraphs between the table and list.
    • This is from your shared Document.
  • You want to achieve this using Google Apps Script.

If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.

Issue and workaround:

Unfortunately, in the current stage, Document service cannot be used the glyph symbol of . By this, when the list with the glyph symbol of is copied by setting the original glyph symbol, the default symbol is used. In order to resolve this issue, I used the following flow.

  1. Copy the Google Document.
  2. Scan the range you want to copy from the copied Google Document.
  3. Delete the range except for the range you want to copy from the copied Google Document.
  4. Delete the inline objects from the copied Google Document.

By above flow, the 1st table and the 2nd and 3rd lists can be copied. And in order to achieve this, I used Google Docs API. Because there are several inline objects in your Document, and unfortunately, I couldn't find the method for deleting them with Document service. The inline objects can be deleted by Docs API, and the ranges can be deleted by one API call.

Sample script:

Before you run the script, please enable Google Docs API at Advanced Google services.

function myFunction() {
  var sourcedocId = '###'; // Please set the source Google Document ID.
  var filenameOfDestDocument = "sampleDestDocument"; // Please set the destination filename of Google Document.

  var fileId = DriveApp.getFileById(sourcedocId).makeCopy(filenameOfDestDocument).getId();
  var doc = Docs.Documents.get(fileId);
  var content = doc.body.content;
  var obj = {table: 1, list: 2};
  var reqs = [];
  for (var i = 0; i < content.length; i++) {
    if ("table" in content[i]) {
      reqs.push({deleteContentRange: {range: {startIndex: 1, endIndex: content[i].startIndex - 1}}});
      obj.table--;
    } else if (obj.table == 0 && obj.list > 0 && "paragraph" in content[i] && "bullet" in content[i].paragraph) {
      while ("paragraph" in content[i] && "bullet" in content[i].paragraph) ++i;
      obj.list--;
    } else if (obj.table == 0 && obj.list == 0) {
      reqs.push({deleteContentRange: {range: {startIndex: content[i].endIndex, endIndex: content[content.length - 1].endIndex - 1}}});
      break;
    } else if ("paragraph" in content[i] && "positionedObjectIds" in content[i].paragraph) {
      Array.prototype.push.apply(reqs, content[i].paragraph.positionedObjectIds.map(function(e) {return {deletePositionedObject: {objectId: e}}}));
    }
  }
  Docs.Documents.batchUpdate({requests: reqs.reverse()}, fileId);
}

Note:

  • In this script, the destination Google Document is created to the same folder of the source Google Document.
  • In this case, the script can be used for your shared Google Document. If you change the document, please modify the script.

References:

If I misunderstood your question and this was not the direction you want, I apologize.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM