简体   繁体   English

如何在谷歌应用程序脚本中将段落 html 字符串转换为没有 html 标签的纯文本?

[英]How to convert a paragraph html string to plain text without html tags in google app script?

this is a follow up question from my previous question.这是我上一个问题的后续问题。 I'm having trouble when I want to convert HTML strings to plain text without HTML tags in google app script using the reference in this question.当我想使用问题中的参考将 google 应用程序脚本中的 HTML 字符串转换为没有 HTML 标记的纯文本时遇到了问题。 However, this time it's a paragraph format.但是,这次是段落格式。

This is the script that I use:这是我使用的脚本:

 function pullDataFromWorkday() { var url = 'https://services1.myworkday.com/ccx/service/customreport2/[company name]/[owner's email]/[Report Name]?format=csv'; //this is the csv link from workday report var b64 = 'asdfghjklkjhgfdfghj=='; //this is supposed to be our workday password in b64 var response = UrlFetchApp.fetch(url, { headers: { Authorization: 'Basic '+ b64 } }); //Parse if (response.getResponseCode() >= 200 && response.getResponseCode() < 300) { var blob = response.getBlob(); var string = blob.getDataAsString(); var data = Utilities.parseCsv(string, ","); for(i=1;i<data.length;i++) { data[i][0]; data[i][1]; data[i][2]=toStringFromHtml(data[i][2]); data[i][3]=toStringFromHtml(data[i][3]); data[i][4]=toStringFromHtml(data[i][4]); data[i][5]=toStringFromHtml(data[i][5]); } //Paste it in var ss = SpreadsheetApp.getActive(); var sheet = ss.getSheetByName('Sheet1'); sheet.clear(); sheet.getRange(1,1,data.length,data[0].length).setValues(data); } else { return; } } function toStringFromHtml(html) { html = '<div>' + html + '</div>'; html = html.replace(/<br>/g,""); var document = XmlService.parse(html); var strText = XmlService.getPrettyFormat().format(document); strText = strText.replace(/<[^>]*>/g,""); return strText.trim(); }

This is the sample of the data that I want:这是我想要的数据样本:

在此处输入图片说明

Or you can use this sample spreadsheet.或者您可以使用此示例电子表格。

Is there any step that I miss or I do wrong?有没有我错过的步骤或我做错了?

Thank you before for answering the question之前谢谢你回答问题

In your situation, how about modifying toStringFromHtml as follows?在你的情况下,如何修改toStringFromHtml如下?

Modified script:修改后的脚本:

function toStringFromHtml(html) {
  html = '<div>' + html + '</div>';
  html = html.replace(/<br>/g, "").replace(/<p><\/p><p><\/p>/g, "<p></p>").replace(/<span>|<\/span>/g, "");
  var document = XmlService.parse(html);
  var strText = XmlService.getPrettyFormat().setIndent("").format(document);
  strText = strText.replace(/<[^>]*>/g, "");
  return strText.trim();
}
  • In this modified script, your following sample HTML is converted as follows.在此修改后的脚本中,您的以下示例 HTML 将按如下方式转换。

    • From

       <p><span>Hi Katy</span></p> <p></p> <p><span>The illustration (examples) paragraph is useful when we want to explain or clarify something, such as an object, a person, a concept, or a situation. Sample Illustration Topics:</span></p> <p></p> <p></p> <p><span>1. Examples of annoying habits people have on the Skytrain.</span></p> <p><span>2. Positive habits that you admire in other people. </span></p> <p><span>3. Endangered animals in Asia. </span></p>
    • To

       <div> <p>Hi Katy</p> <p></p> <p>The illustration (examples) paragraph is useful when we want to explain or clarify something, such as an object, a person, a concept, or a situation. Sample Illustration Topics:</p> <p></p> <p>1. Examples of annoying habits people have on the Skytrain.</p> <p>2. Positive habits that you admire in other people. </p> <p>3. Endangered animals in Asia. </p> </div>
    • By this conversion, the following result is obtained.通过这种转换,得到以下结果。

       Hi Katy The illustration (examples) paragraph is useful when we want to explain or clarify something, such as an object, a person, a concept, or a situation. Sample Illustration Topics: 1. Examples of annoying habits people have on the Skytrain. 2. Positive habits that you admire in other people. 3. Endangered animals in Asia.

Note:笔记:

  • When your sample HTML shown in your question is used, the modified script can achieve your goal.使用问题中显示的示例 HTML 时,修改后的脚本可以实现您的目标。 But, I'm not sure about your other HTML data.但是,我不确定您的其他 HTML 数据。 So I'm not sure whether this modified script can be used for your actual HTML data.所以我不确定这个修改后的脚本是否可以用于您的实际 HTML 数据。 Please be careful about this.请注意这一点。

我想你可以使用这个库: cheerio for Google Apps Script

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM