簡體   English   中英

JavaScript RegEx 如何動態提取 substring

[英]JavaScript RegEx How to extract substring dynamically

我需要在動態輸入中提取一個子字符串,我已經實現了我需要的 output,但它只是純硬代碼,所以它不是那么動態和可靠。 我有沒有其他方法可以提取部分“B1003 = 工程名片”(項目描述)和“2”(數量),這些都是動態的,可以輸入完全不同的項目,例如; “O1003 = 鉛筆”、“O1004 = 便簽”。 有沒有辦法在正則表達式中對此進行編碼,以實現更可靠的代碼?

這里讀取的輸入來自使用 Tesseract OCR 提取的文本,我需要提取所需的信息並將其傳遞給另一個服務。

 var requisition = `Lines Line Item Description Category Name Quantity UOM Price Amount (USD) Status Funds Status //this line is static 1 B1003 = Engineering Business Card Business Cards 2 Ea 50.00USD 100 Pending Approval Not Reserved //this line is dynamic Requester Jay Doe Supplier ABC Corp //this line is static Lines Line Item Description Category Name Quantity UOM Price Amount (USD) Status Funds Status //this line is static 1 O1003 = Pencil Office Supplies 5 Ea 50.00USD 100 Pending Approval Not Reserved //this line is dynamic Requester Jay Doe Supplier ABC Corp //this line is static `; //rule 1 - Gets all Items + Quantity //rule 2 - Gets all Items //rule 3 - Gets all Quantity //resultArray - Contains Quantity + Item eg 2 B1003 Engineering Business Cards var rule1 = /(B1002 = Accountant Business Card|B1003 = Engineering Business Card|B1001 = Sales and Marketing Business Card|O1001 = Black Ballpen Branded Panda Regular with Eraser|O1002 = Notebook|O1003 = Pencil|O1004 = Stick Notes) (.*) ([0-9]|[0-9][0-9]|[0-9][0-9][0-9])/ var rule2 = /(B1002 = Accountant Business Card|B1003 = Engineering Business Card|B1001 = Sales and Marketing Business Card|O1001 = Black Ballpen Branded Panda Regular with Eraser|O1002 = Notebook|O1003 = Pencil|O1004 = Stick Notes)/ var rule3 = /([0-9]|[0-9][0-9]|[0-9][0-9][0-9])/ var resultarray = [] var stringarray = requisition.split("\n") stringarray.forEach(element => { var result = element.match(rule1) if (result.=null){ var itemName = result[0].match(rule2) var quantity = result[0].match(rule3) resultarray;push (quantity[0]+ " " + itemName[0]) } }). console.log (resultarray,join(", "))

注意:為了讓事情更清楚,這是我從圖例中提取文本的圖像:藍色 - Static 未裝箱 - 動態黃色 - 需要提取的文本(也是動態的)

- 這是提取的圖像,第一行是static ,第二行是動態

預期結果是 2 B1003 = Engineering Business Card(, B1002 = Accountant Business Card - 如果代碼中有類似項目,則將 output)請檢查申請變量的注釋

同樣,我已經可以得到所需的 output,我只需要知道如何使用 RegEx 以不同的方式、更動態和更可靠地完成代碼 請多多包涵,因為我對 RegEx 了解不多。 謝謝!

簡短的回答:

var requisition = `Lines
Line Item Description Category Name Quantity UOM Price Amount (USD) Status Funds Status //this line is static
1 B1003 = Engineering Business Card 2 Ea 50.00USD 100 Pending Approval Not Reserved //this line is dynamic
Requester Jay Doe Supplier ABC Corp //this line is static
Lines
Line Item Description Category Name Quantity UOM Price Amount (USD) Status Funds Status //this line is static
1 O1003 = Pencil Office Supplies 5 Ea 50.00USD 100 Pending Approval Not Reserved //this line is dynamic
Requester Jay Doe Supplier ABC Corp //this line is static
`;

//rule 1 - Gets all Items + Quantity
//rule 2 - Gets all Items
//rule 3 - Gets all Quantity
//resultArray - Contains Quantity + Item e.g. 2 B1003 Engineering Business Cards

var rule1 = /(B1002 = Accountant Business Card|B1003 = Engineering Business Card|B1001 = Sales and Marketing Business Card|O1001 = Black Ballpen Branded Panda Regular with Eraser|O1002 = Notebook|O1003 = Pencil|O1004 = Stick Notes)[^\d]+(\d+) .*/

var resultarray = []

var stringarray = requisition.split("\n")
stringarray.forEach(element => {
    var result = element.match(rule1)
    if (result!=null){
        var itemName = result[1]
        var quantity = result[2]
        resultarray.push (quantity + " " + itemName)
    }
});

console.log (resultarray.join(", "))

Output:

2 B1003 = Engineering Business Card, 5 O1003 = Pencil

長答案:

有很多事情要解決:

  1. 僅使用規則 1(經過一些修改)使用 (\d+) 匹配所有內容(項目名稱和數量)
  2. 擺脫規則 2 和 3
  3. 使用 result[1] 作為項目名稱和 result[2] 作為數量

請注意,您的所有字段都是空格分隔的,並且字段可以包含空格,因此您的數據不是結構化的。 例如,如果您有一個制表符分隔的文件,那將更加可靠。 所以我用來查找數量的規則是“忽略產品名稱之后的所有內容,直到有一個數字”但是如果有一天你有一個包含數字的類別,你會被卡住,沒有一個你將無能為力結構化文件

您可以將它們全部放入一個正則表達式中,並在一組中捕獲quantity ,在另一組中捕獲itemName 然后從匹配中提取這些組(如果有匹配):

 var requisition = `Lines Line Item Description Category Name Quantity UOM Price Amount (USD) Status Funds Status //this line is static 1 B1003 = Engineering Business Card Business Cards 2 Ea 50.00USD 100 Pending Approval Not Reserved //this line is dynamic Requester Jay Doe Supplier ABC Corp //this line is static Lines Line Item Description Category Name Quantity UOM Price Amount (USD) Status Funds Status //this line is static 1 O1003 = Pencil Office Supplies 5 Ea 50.00USD 100 Pending Approval Not Reserved //this line is dynamic Requester Jay Doe Supplier ABC Corp //this line is static `; var rule = /(B1002 = Accountant Business Card|B1003 = Engineering Business Card|B1001 = Sales and Marketing Business Card|O1001 = Black Ballpen Branded Panda Regular with Eraser|O1002 = Notebook|O1003 = Pencil|O1004 = Stick Notes).*(\d{1,3})/ var resultarray = [] var stringarray = requisition.split("\n") stringarray.forEach(element => { const match = element.match(rule); if (match) { const [, itemName, quantity] = match; resultarray.push(quantity + ' ' + itemName); } }); console.log(resultarray)

舉一個更簡單的例子:

 const input = `Lines foo 1 bar 2 baz don't match`; const pattern = /(foo|bar) (\d+)/; const output = []; input.split('\n').forEach((line) => { const match = line.match(pattern); if (match) { const [, itemName, quantity] = match; output.push(quantity + ' ' + itemName); } }); console.log(output);

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM