简体   繁体   中英

Google Apps Script; Docs; convert selected element to HTML

I am just starting with Google Apps Script and following the Add-on quickstart

https://developers.google.com/apps-script/quickstart/docs

In the quickstart you can create a simple add-on to get a selection from a document and translate it with the LanguageApp service. The example gets the underlying text using this:

function getSelectedText() {
  var selection = DocumentApp.getActiveDocument().getSelection();
  if (selection) {
    var text = [];
    var elements = selection.getSelectedElements();
    for (var i = 0; i < elements.length; i++) {
      if (elements[i].isPartial()) {
        var element = elements[i].getElement().asText();
        var startIndex = elements[i].getStartOffset();
        var endIndex = elements[i].getEndOffsetInclusive();

        text.push(element.getText().substring(startIndex, endIndex + 1));
      } else {
        var element = elements[i].getElement();
        // Only translate elements that can be edited as text; skip images and
        // other non-text elements.
        if (element.editAsText) {
          var elementText = element.asText().getText();
          // This check is necessary to exclude images, which return a blank
          // text element.
          if (elementText != '') {
            text.push(elementText);
          }
        }
      }
    }
    if (text.length == 0) {
      throw 'Please select some text.';
    }
    return text;
  } else {
    throw 'Please select some text.';
  }
}

It gets the text only: element.getText() , without any formatting.

I know the underlying object is not html, but is there a way to get the selection converted into a HTML string? For example, if the selection has a mix of formatting, like bold:

this is a sample with bold text

Then is there any method, extension, library, etc, -- like element.getHTML() -- that could return this?

this is a sample with <b>bold</b> text

instead of this?

this is a sample with bold text

There is a script GoogleDoc2HTML by Omar AL Zabir. Its purpose is to convert the entire document into HTML. Since you only want to convert rich text within the selected element, the function relevant to your task is processText from the script, shown below.

The method getTextAttributeIndices gives the starting offsets for each change of text attribute, like from normal to bold or back. If there is only one change, that's the attribute for the entire element (typically paragraph), and this is dealt with in the first part of if-statement.

The second part deals with the general case, looping over the indices and inserting HTML markup corresponding to the attributes.

The script isn't maintained, so consider it as a starting point for your own code, rather than a ready-to-use library. There are some unmerged PRs that improve the conversion process, in particular for inline links.

function processText(item, output) {
  var text = item.getText();
  var indices = item.getTextAttributeIndices();

  if (indices.length <= 1) {
    // Assuming that a whole para fully italic is a quote
    if(item.isBold()) {
      output.push('<b>' + text + '</b>');
    }
    else if(item.isItalic()) {
      output.push('<blockquote>' + text + '</blockquote>');
    }
    else if (text.trim().indexOf('http://') == 0) {
      output.push('<a href="' + text + '" rel="nofollow">' + text + '</a>');
    }
    else {
      output.push(text);
    }
  }
  else {

    for (var i=0; i < indices.length; i ++) {
      var partAtts = item.getAttributes(indices[i]);
      var startPos = indices[i];
      var endPos = i+1 < indices.length ? indices[i+1]: text.length;
      var partText = text.substring(startPos, endPos);

      Logger.log(partText);

      if (partAtts.ITALIC) {
        output.push('<i>');
      }
      if (partAtts.BOLD) {
        output.push('<b>');
      }
      if (partAtts.UNDERLINE) {
        output.push('<u>');
      }

      // If someone has written [xxx] and made this whole text some special font, like superscript
      // then treat it as a reference and make it superscript.
      // Unfortunately in Google Docs, there's no way to detect superscript
      if (partText.indexOf('[')==0 && partText[partText.length-1] == ']') {
        output.push('<sup>' + partText + '</sup>');
      }
      else if (partText.trim().indexOf('http://') == 0) {
        output.push('<a href="' + partText + '" rel="nofollow">' + partText + '</a>');
      }
      else {
        output.push(partText);
      }

      if (partAtts.ITALIC) {
        output.push('</i>');
      }
      if (partAtts.BOLD) {
        output.push('</b>');
      }
      if (partAtts.UNDERLINE) {
        output.push('</u>');
      }

    }
  }
}

Ended up making a script to support my use-case of bold+links+italics:

function getHtmlOfElement(element) {
  var text = element.editAsText();
  var string = text.getText();
  var indices = text.getTextAttributeIndices();
  var output = [];

  for (var i = 0; i < indices.length; i++) {
    var offset = indices[i];
    var startPos = offset;
    var endPos = i+1 < indices.length ? indices[i+1]: string.length;
    var partText = string.substring(startPos, endPos);

    var isBold = text.isBold(offset);
    var isItalic = text.isItalic(offset);
    var linkUrl = text.getLinkUrl(offset);

    if (isBold) {
      output.push('<b>');
    }
    if (isItalic) {
      output.push('<i>');
    }
    if (linkUrl) {
      output.push('<a href="' + linkUrl + '">');
    }

    output.push(partText);

    if (isBold) {
      output.push('</b>');
    }
    if (isItalic) {
      output.push('</i>');
    }
    if (linkUrl) {
      output.push('</a>');
    }
  }

  return output.join("");
}

You can simply call it using something like:

getHtmlOfElement(myTableCell); // returns something like "<b>Bold</b> test."

This is obviously a workaround, but you can copy/paste a Google Doc into a draft in Gmail and then that draft can be turned into HTML using

GmailApp.getDraft(draftId).getMessage().getBody().toString();

I found this thread trying to skip that step by going straight from a Doc to HTML, but I thought I'd share.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM