简体   繁体   中英

How to import specific text/line under headings in Google doc to Google sheet using google apps script?

I am trying to get specific text/line that is under the heading or subheading using the google apps script. Here is the screenshot of the Google doc:

谷歌表格

I am looking to export the Question text/line from every heading to google sheets like this:

谷歌表

So far I have been able to get all the headings from the google doc by this code snippet:

function getHeadings() {
  var pars = DocumentApp.getActiveDocument().getBody().getParagraphs();
  var hdg = [];
  for(var i=0; i<pars.length; i++) {
    var par = pars[i];
    hdg.push(par.getHeading());

  }
     Logger.log(hdg)
}

But I am unable to get the Question text under all these headings, kindly can you guide me in the right direction? (I apologize for my question formatting, I am a newbie on this website). Thank you

Use getChild() and and push() functions

Setup

I used the following as setup:

测试

Script

function getHeadings() {
  var doc = DocumentApp.getActiveDocument();
  var childNum = doc.getBody().getNumChildren();
  var ss = SpreadsheetApp.openById("<spreadsheetID>").getSheetByName("Sheet1"); //Change ID of Spreadsheet and Sheet name
  var out = [];
  for (var i = 0; i < childNum; i += 2) {
    out.push([doc.getBody().getChild(i).asText().getText()]);
  }
  for (var i = 1, j = 0; i < childNum; i += 2, j++) {
    out[j].push(doc.getBody().getChild(i).asText().getText().replace(/Question(.*)- /gm, ""));
  }
  ss.getRange(ss.getLastRow() + 1, 1, childNum / 2, 2).setValues(out); //edit range to proper coordinates
}

The script uses the getChild() function to get the texts in the document. Afterwards, the getNumChildren() will get the number of children to be used in the two separate for loops which will segregate the heading and the question texts. After the texts are identified, they are separated, paired, and then pushed (using the push() function) into arrays. Once the arrays are formed, the data will be appended to the spreadsheet using the setValues() function.

Output

输出

Update

Strictly for this format: 设置 2

You may use the following script as basis for your code:

function getHeadings() {
  var doc = DocumentApp.getActiveDocument();
  var childNum = doc.getBody().getNumChildren();
  var ss = SpreadsheetApp.openById("<SpreadsheetID>").getSheetByName("Sheet1"); //Change ID of Spreadsheet and Sheet name
  var header = [];
  var question = [];
  var out = [];
  for (var i = 0; i < childNum; i++) {
    if (doc.getBody().getChild(i).asText().getText().match(/Header(.*)/gm)){
      header.push(doc.getBody().getChild(i).asText().getText());
    }
    else if (doc.getBody().getChild(i).asText().getText().match(/Question(.*)/gm)){
      question.push(doc.getBody().getChild(i).asText().getText());
    }
  }
  for (j=0; j<header.length; j++) {
    out.push([header[j]]);
    out[j].push(question[j].replace(/Question(.*)- /gm, ""));
  }
  ss.getRange(ss.getLastRow() + 1, 1, header.length, 2).setValues(out); //edit range to proper coordinates
}

References:

There are a couple of assumptions for this script:

  • There is a finite number of header styles you're using (eg 'Header 1' and 'Header 2' in my example below)
  • Your questions always contain 'QUESTION:'
  • There is no other text apart from the headers and the question lines (but if there is, in principle it will be skipped)

In that case, the below code will work:

function getHeadings() {
  var pars = DocumentApp.getActiveDocument().getBody().getParagraphs();
  var currentHdg = "";
  var questions = [];
  for(var i=0; i<pars.length; i++) {
    var par = pars[i];
    var text = par.getText();
    var hdg = par.getHeading();
    if (hdg.toString().indexOf("HEADING") > -1){
      currentHdg = text;
    }
    else if(text.indexOf("QUESTION:") > -1){
      questions.push([currentHdg,text.replace("QUESTION:","").trim()]);
    }
  }
  Logger.log(questions);
}

You can then format questions into the table output format you need.

Edit: I have updated my answer to cover all heading types.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM