简体   繁体   English

使用纯Javascript提取字符串的复杂部分

[英]Extracting a complicated part of the string with plain Javascript

I have a following string: 我有以下字符串:

<a href="https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx">Text</a>

I want to extract from this string, with the use of JavaScript 'pl' or 'pl_company_com' 我想使用JavaScript'pl'或'pl_company_com'从此字符串中提取

There are a few variables: 有一些变量:

  • jan_kowalski is a name and surname it can change, and sometimes even have 3 elements jan_kowalski是可以更改的名称和姓氏,有时甚至包含3个元素

  • the country code (in this example 'pl') will change to other en / de / fr (this is that part of the string i want to get) 国家代码(在此示例中为“ pl”)将更改为其他en / de / fr(这是我想要获取的字符串的一部分)

  • the rest of the string remains the same for every case (beginning + everything after starting with _company_com ... 在每种情况下,字符串的其余部分均保持不变(从_company_com开始的所有内容+开始...

Ps. PS。 I tried to do it with split, but my knowledge of JS is very basic and I cant get what i want, plase help 我尝试使用split进行操作,但是我对JS的知识非常基础,无法获得想要的内容,请帮忙

An alternative to Randy Casburn's solution using regex 使用正则表达式的Randy Casburn解决方案的替代方案

 let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_(.*_company_com)')[1]; console.log(out); 

Or if you want to just get that string with those country codes you specified 或者,如果您只想使用指定的国家/地区代码获取该字符串

 let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1]; console.log(out); 

 let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1]; console.log(out); 

A proof of concept that this solution also works for other combinations 该解决方案也适用于其他组合的概念证明

 let urls = [ new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx'), new URL('https://my.domain.com/personal/firstname_middlename_lastname_pl_company_com/Documents/Forms/All.aspx') ] urls.forEach(url => console.log(url.href.match('.*_(en|de|fr|pl).*')[1])) 

I have been very successful before with this kind of problems with regular expressions: 我以前在用正则表达式解决此类问题方面非常成功:

var string = '<a href="https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx">Text</a>';
var regExp = /([\w]{2})_company_com/;

find = string.match(regExp);

console.log(find); // array with found matches
console.log(find[1]); // first group of regexp = country code

First you got your given string. 首先,您得到了给定的字符串。 Second you have a regular expression, which is marked with two slashes at the beginning and at the end. 其次,您有一个正则表达式,该正则表达式在开头和结尾处都标有两个斜杠。 A regular expression is mostly used for string searches (you can even replace complicated text in all major editors with it, which can be VERY useful). 正则表达式通常用于字符串搜索(您甚至可以在所有主要编辑器中用它替换复杂的文本,这非常有用)。 In this case here it matches exactly two word characters [\\w]{2} followed directly by _company_com ( \\w indicates a word character, the [] group all wanted character types, here only word characters, and the {} indicate the number of characters to be found). 在这种情况下,它恰好匹配两个单词字符[\\w]{2}然后直接跟_company_com\\w表示单词字符, []组表示所有需要的字符类型,此处仅单词字符,而{}表示数字字符数)。 Now to find the wanted part string.match(regExp) has to be called to get all captured findings. 现在找到想要的零件string.match(regExp)必须调用string.match(regExp)以获取所有捕获的发现。 It returns an array with the whole captured string followed by all capture groups within the regExp (which are denoted by () ). 它返回一个数组,其中包含整个捕获的字符串,然后是regExp中的所有捕获组(由()表示)。 So in this case you get the country code with find[1] , which is the first and only capture group of the regular expression. 因此,在这种情况下,您可以使用find[1]获得国家代码,这是正则表达式的第一个也是唯一的捕获组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM