简体   繁体   English

Javascript 用来自 Array 的子字符串替换字符串

[英]Javascript replace strings with substrings that comes from Array

I am wondering how to replace in string substring that is stored in array of substrings and do it the best way.我想知道如何替换存储在子字符串数组中的字符串 substring 并以最好的方式做到这一点。 I was reading about RegEx and String.replace() method but can't find working example.我正在阅读有关RegExString.replace()方法的信息,但找不到工作示例。 The goal is to remove from company name the substring such as ".ltd" or "ltd"目标是从公司名称中删除 substring,例如“.ltd”或“ltd”

My code:我的代码:

 function removeCompanySubstring(string) { var regex = companySubstrings console.log(regex) return string.replace(regex, ""); } console.log(removeCompanySubstring("Company.ltd"))

companySubstrings.json companySubstrings.json

"companySubstrings": [
        "ltd.",
        ".ltd",
        "GmbH",
        ...
    ],

When I call the function removeCompanySubstring("Company GmbH") the output is wrong and it returns当我调用 function removeCompanySubstring("Company GmbH")时,output 错误并返回

> Company GmbH

But the outpt should be但输出应该是

> Company

and it should remove the "Gmbh" from Company它应该从公司中删除“Gmbh”

You can't pass an array as regex.您不能将数组作为正则表达式传递。 You have to loop over.你必须循环。 See the snippet.见片段。

 var companySubstrings = [ "ltd.", ".ltd", "GmbH", ]; function removeCompanySubstring(string) { for (var needle of companySubstrings) { string = string.replace(needle, ""); } return string.trim(); } console.log(removeCompanySubstring("Company GmbH.ltd"))

Watch out for edge-cases!注意边缘情况!

In terms of implementation, you can:在实施方面,您可以:

  • build your regex as @Teemu said (without forgetting to escape the . character which is a substitute for any character in the regex world)像@Teemu 所说的那样构建你的正则表达式(不要忘记转义.字符,它可以替代正则表达式世界中的任何字符
  • or loop over the array and replace any occurrence of the substrings或遍历数组并替换任何出现的子字符串

No matter what method you choose, you should finally trim the string to remove any trailing/leading whitespace left.无论您选择哪种方法,您都应该最终修剪字符串以删除任何尾随/前导空格。

Although it's far from being bulletproof methods: consider what happens if you change lowercase to uppercase?尽管它远非防弹方法:考虑一下如果将小写字母更改为大写字母会发生什么? Sure you can make it case insensitive using a regex, but then what if the pattern is also found within the actual name of the company?当然,您可以使用正则表达式使其不区分大小写,但是如果在公司的实际名称中也发现了该模式怎么办?

For instance "limited" which you listed as a possible substring can be found in a company name like "Unlimited clothing limited".例如,您列为可能的 substring 的“有限”可以在“无限服装有限公司”之类的公司名称中找到。 The french "SAS" could be found in "SASUN GmbH"...法语“SAS”可以在“SASUN GmbH”中找到...

And this gets worse the longer companySubstrings gets because you'll be more and more likely to find one of the patterns within a legit company name as you keep adding new patterns.companySubstrings越长,情况就越糟,因为随着您不断添加新模式,您将越来越有可能在合法的公司名称中找到其中一种模式。

Also, should the substring be found multiple time in a company name, should we replace just a single occurrence?另外,substring 是否应该在公司名称中多次出现,我们应该只替换一次吗? Then which one: first, last?然后是哪一个:第一个,最后一个? Another one?另一个? Same question for when we find different substrings in the same company name.当我们在同一个公司名称中找到不同的子字符串时,同样的问题。

Turns out it's not such a trivial problem at all.事实证明,这根本不是一个微不足道的问题。

Two different implementations, with their own shortcomings两种不同的实现,各有各的缺点

 var companySubstrings = [ "SAS", "limited", "ltd.", ".ltd", "GmbH" ]; // Using a regex to replace all occurrences (case insensitive) function removeCompanySubstringRegex(string) { // If the companySubstrings are never changing, you should declare // the regex as a const outside of this function so you don't // build a new regex each time you call it return string.replace( new RegExp( companySubstrings.join('|').replaceAll('.', '\\.'), 'gi' // g: replace all - i: case insensitive ), '' ).trim(); } // Looping over the substrings array to replace matches (case sensitive) function removeCompanySubstringLoop(string) { let result = string; companySubstrings.forEach( // use `result.replaceAll(str, '')` to replace **all** occurrences // and get the same behavior as the `g` flag in a regex str => result = result.replace(str, '') ); return result.trim(); } // Loop: `SAS Company.ltd` -> `Company` ✅ console.log( 'Loop: ', removeCompanySubstringLoop("SAS Company.ltd") ); // Regex: `SAS Company.ltd` -> `Company` ✅ console.log( 'Regex: ', removeCompanySubstringRegex("SAS Company.ltd") ) // Loop: `Unlimited limited` -> `Un limited`  console.log( 'Tricky name (loop): ', removeCompanySubstringLoop("Unlimited limited") ) // Regex: `Unlimited limited` -> `Un`  console.log( 'Tricky name (regex): ', removeCompanySubstringRegex("Unlimited limited") )

I made a case insensitive version:)我做了一个不区分大小写的版本:)

 let companySubstrings = [ "SAS", "ltd.", ".ltd", "GmbH", ]; function removeCompanySubstring(compName) { let result = compName; companySubstrings.forEach(str => { let regex = RegExp(str, 'gi'); result = result.replace(regex, '') }); return result.trim(); } console.log(removeCompanySubstring("SAS Company.ltd")) console.log(removeCompanySubstring("Company.ltd")); console.log(removeCompanySubstring("Company GmbH")); console.log(removeCompanySubstring("LTD. Company GmbH"));

In the approach you want to solve, I would suggest you to just create a Regex for the Array of strings you are getting in response.在您要解决的方法中,我建议您为响应中的字符串数组创建一个正则表达式。

function removeCompanySubstring(string) {
  regexArr = new RegExp(companySubstrings.join("|"), 'gi'); 
  return string.replace(regexArr, "");
}
 console.log(removeCompanySubstring("Company .ltd"))

ex-前任-

 var arr = ["ltd.", ".ltd","GmbH"]; //the array you have
 var regexArr = new RegExp(arr.join("|"), 'gi'); //the regex you need

then follow the similar approach you are now using just update regex with regexArr you have created.然后遵循您现在使用的类似方法,只需使用您创建的regexArr更新正则表达式

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM