简体   繁体   中英

extract data from gmail add to spreadsheet- Google apps script

I have searched, copied and modified code, and tried to break down what others have done and I still can't get this right.

I have email receipts for an ecommerce webiste, where I am trying to harvest particular details from each email and save to a spreadsheet with a script.

Here is the entire script as I have now.

function menu(e) {
  var ui = SpreadsheetApp.getUi();
  ui.createMenu('programs')
      .addItem('parse mail', 'grabreceipt')
      .addToUi();
}

function grabreceipt() {

  var ss = SpreadsheetApp.getActiveSheet();
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var s = ss.getSheetByName("Sheet1");
  var threads = GmailApp.search("(subject:order receipt) and (after:2016/12/01)");
  var a=[];
  for (var i = 0; i<threads.length; i++)
  {
    var messages = threads[i].getMessages();

    for (var j=0; j<messages.length; j++)
    {
    var messages = GmailApp.getMessagesForThread(threads[i]);
    for (var j = 0; j < messages.length; j++) {
      a[j]=parseMail(messages[j].getPlainBody());
    }
  }
  var nextRow=s.getDataRange().getLastRow()+1;
  var numRows=a.length;
  var numCols=a[0].length;
  s.getRange(nextRow,1,numRows,numCols).setValues(a);
}

function parseMail(body) {
  var a=[];
  var keystr="Order #,Subtotal:,Shipping:,Total:";
  var keys=keystr.split(",");
  var i,p,r;
  for (i in keys)  {
    //p=keys[i]+(/-?\d+(,\d+)*(\.\d+(e\d+)?)?/);
    p=keys[i]+"[\r\n]*([^\r^\n]*)[\r\n]";
    //p=keys[i]+"[\$]?[\d]+[\.]?[\d]+$";
    r=new RegExp(p,"m");
    try {a[i]=body.match(p)[1];}
    catch (err) {a[i]="no match";}
  }
  return a;
}
}

So the email data to pluck from comes as text only like this:

Order #89076
(body content, omitted)
Subtotal: $528.31
Shipping: $42.66 via Priority Mail®
Payment Method: Check Payment- Money order
Total: $570.97

Note: mywebsite order 456. Customer asked about this and that... etc.

The original code regex was designed to grab content, following the keystr values which were easily found on their own line. So this made sense:

p=keys[i]+"[\r\n]*([^\r^\n]*)[\r\n]";

This works okay, but results where the lines include more data that follows as in line Shipping: $42.66 via Priority Mail®.

My data is more blended, where I only wish to take numbers, or numbers and decimals. So I have this instead which validates on regex101.com

p=keys[i]+"[\$]?[\d]+[\.]?\d+$";

The expression only, [\\$]?[\\d]+[.]?\\d+$ works great but I still get "no match" for each row.

Additionally, within this search there are 22 threads returned, and it populates 39 rows in the spreadsheet. I can not figure out why 39?

The reason for your regex not working like it should is because you are not escaping the "\\" in the string you use to create the regex

So a regex like this

"\s?\$?(\d+\.?\d+)"

needs to be escaped like so:

"\\s?\\$?(\\d+\\.?\\d+)"

The below code is just modified from your parseEmail() to work as a snippet here. If you copy this to your app script code delete document.getElementById() lines.

Your can try your example in the snippet below it will only give you the numbers.

 function parseMail(body) { if(body == "" || body == undefined){ var body = document.getElementById("input").value } var a=[]; var keystr="Order #,Subtotal:,Shipping:,Total:"; var keys=keystr.split(","); var i,p,r; for (i in keys) { p=keys[i]+"\\\\s?\\\\$?(\\\\d+\\\\.?\\\\d+)"; r=new RegExp(p,"m"); try {a[i]=body.match(p)[1];} catch (err) {a[i]="no match";} } document.getElementById("output").innerHTML = a.join(";") return a; } 
 <textarea id ="input"></textarea> <div id= "output"></div> <input type = "button" value = "Parse" onclick = "parseMail()"> 

Hope that helps

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM