[英]How to convert a paragraph html string to plain text without html tags in google app script?
this is a follow up question from my previous question.这是我上一个问题的后续问题。 I'm having trouble when I want to convert HTML strings to plain text without HTML tags in google app script using the reference in this question.
当我想使用此问题中的参考将 google 应用程序脚本中的 HTML 字符串转换为没有 HTML 标记的纯文本时遇到了问题。 However, this time it's a paragraph format.
但是,这次是段落格式。
This is the script that I use:这是我使用的脚本:
function pullDataFromWorkday() { var url = 'https://services1.myworkday.com/ccx/service/customreport2/[company name]/[owner's email]/[Report Name]?format=csv'; //this is the csv link from workday report var b64 = 'asdfghjklkjhgfdfghj=='; //this is supposed to be our workday password in b64 var response = UrlFetchApp.fetch(url, { headers: { Authorization: 'Basic '+ b64 } }); //Parse if (response.getResponseCode() >= 200 && response.getResponseCode() < 300) { var blob = response.getBlob(); var string = blob.getDataAsString(); var data = Utilities.parseCsv(string, ","); for(i=1;i<data.length;i++) { data[i][0]; data[i][1]; data[i][2]=toStringFromHtml(data[i][2]); data[i][3]=toStringFromHtml(data[i][3]); data[i][4]=toStringFromHtml(data[i][4]); data[i][5]=toStringFromHtml(data[i][5]); } //Paste it in var ss = SpreadsheetApp.getActive(); var sheet = ss.getSheetByName('Sheet1'); sheet.clear(); sheet.getRange(1,1,data.length,data[0].length).setValues(data); } else { return; } } function toStringFromHtml(html) { html = '<div>' + html + '</div>'; html = html.replace(/<br>/g,""); var document = XmlService.parse(html); var strText = XmlService.getPrettyFormat().format(document); strText = strText.replace(/<[^>]*>/g,""); return strText.trim(); }
This is the sample of the data that I want:这是我想要的数据样本:
Or you can use this sample spreadsheet.或者您可以使用此示例电子表格。
Is there any step that I miss or I do wrong?有没有我错过的步骤或我做错了?
Thank you before for answering the question之前谢谢你回答问题
In your situation, how about modifying toStringFromHtml
as follows?在你的情况下,如何修改
toStringFromHtml
如下?
function toStringFromHtml(html) {
html = '<div>' + html + '</div>';
html = html.replace(/<br>/g, "").replace(/<p><\/p><p><\/p>/g, "<p></p>").replace(/<span>|<\/span>/g, "");
var document = XmlService.parse(html);
var strText = XmlService.getPrettyFormat().setIndent("").format(document);
strText = strText.replace(/<[^>]*>/g, "");
return strText.trim();
}
In this modified script, your following sample HTML is converted as follows.在此修改后的脚本中,您的以下示例 HTML 将按如下方式转换。
From从
<p><span>Hi Katy</span></p> <p></p> <p><span>The illustration (examples) paragraph is useful when we want to explain or clarify something, such as an object, a person, a concept, or a situation. Sample Illustration Topics:</span></p> <p></p> <p></p> <p><span>1. Examples of annoying habits people have on the Skytrain.</span></p> <p><span>2. Positive habits that you admire in other people. </span></p> <p><span>3. Endangered animals in Asia. </span></p>
To到
<div> <p>Hi Katy</p> <p></p> <p>The illustration (examples) paragraph is useful when we want to explain or clarify something, such as an object, a person, a concept, or a situation. Sample Illustration Topics:</p> <p></p> <p>1. Examples of annoying habits people have on the Skytrain.</p> <p>2. Positive habits that you admire in other people. </p> <p>3. Endangered animals in Asia. </p> </div>
By this conversion, the following result is obtained.通过这种转换,得到以下结果。
Hi Katy The illustration (examples) paragraph is useful when we want to explain or clarify something, such as an object, a person, a concept, or a situation. Sample Illustration Topics: 1. Examples of annoying habits people have on the Skytrain. 2. Positive habits that you admire in other people. 3. Endangered animals in Asia.
我想你可以使用这个库: cheerio for Google Apps Script
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.