简体   繁体   中英

Extracting a complicated part of the string with plain Javascript

I have a following string:

<a href="https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx">Text</a>

I want to extract from this string, with the use of JavaScript 'pl' or 'pl_company_com'

There are a few variables:

  • jan_kowalski is a name and surname it can change, and sometimes even have 3 elements

  • the country code (in this example 'pl') will change to other en / de / fr (this is that part of the string i want to get)

  • the rest of the string remains the same for every case (beginning + everything after starting with _company_com ...

Ps. I tried to do it with split, but my knowledge of JS is very basic and I cant get what i want, plase help

An alternative to Randy Casburn's solution using regex

 let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_(.*_company_com)')[1]; console.log(out); 

Or if you want to just get that string with those country codes you specified

 let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1]; console.log(out); 

 let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1]; console.log(out); 

A proof of concept that this solution also works for other combinations

 let urls = [ new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx'), new URL('https://my.domain.com/personal/firstname_middlename_lastname_pl_company_com/Documents/Forms/All.aspx') ] urls.forEach(url => console.log(url.href.match('.*_(en|de|fr|pl).*')[1])) 

I have been very successful before with this kind of problems with regular expressions:

var string = '<a href="https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx">Text</a>';
var regExp = /([\w]{2})_company_com/;

find = string.match(regExp);

console.log(find); // array with found matches
console.log(find[1]); // first group of regexp = country code

First you got your given string. Second you have a regular expression, which is marked with two slashes at the beginning and at the end. A regular expression is mostly used for string searches (you can even replace complicated text in all major editors with it, which can be VERY useful). In this case here it matches exactly two word characters [\\w]{2} followed directly by _company_com ( \\w indicates a word character, the [] group all wanted character types, here only word characters, and the {} indicate the number of characters to be found). Now to find the wanted part string.match(regExp) has to be called to get all captured findings. It returns an array with the whole captured string followed by all capture groups within the regExp (which are denoted by () ). So in this case you get the country code with find[1] , which is the first and only capture group of the regular expression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM