[英]How do I count word occurence from a list of urls in JavaScript?
I have list of urls in a JSON object in WordPress.我在 WordPress 的 JSON 对象中有 url 列表。 I want to count the occurence of the second part of the url.我想计算 url 的第二部分的出现。
The code below currently gets the rest of the url after the prefix https://www.example.co
.下面的代码当前获取前缀https://www.example.co
之后的其余 url。 What I want to do next is the count the occurence of the second part of the url which is cat1, cat3, cat2, xmlrpc.php
我接下来要做的是计算 url 的第二部分出现的次数,即cat1, cat3, cat2, xmlrpc.php
var urlList = [
{
"URL": "https://www.example.co/cat1/aa/bb/cc",
"Last crawled": "Jun 23, 2019"
},
{
"URL": "https://www.example.co/cat2/aa",
"Last crawled": "Jun 23, 2019"
},
{
"URL": "https://www.example.co/cat1/aa/bb/cc/dd/ee",
"Last crawled": "Jun 23, 2019"
},
{
"URL": "https://www.example.co/cat3/aa/bb/cc/",
"Last crawled": "Jun 23, 2019"
},
{
"URL": "https://www.example.co/cat2/aa/bb",
"Last crawled": "Jun 23, 2019"
},
{
"URL": "https://www.example.co/cat1/aa/bb",
"Last crawled": "Jun 23, 2019"
},
{
"URL": "https://www.example.co/xmlrpc.php",
"Last crawled": "Jun 19, 2019"
}
]
const paths = urlList.map(value => value.URL.replace('https://www.example.co', ''));
//console.log(paths);
paths.forEach(function(item) {
var urlSecondPart = item.split("/")[1];
console.log(urlSecondPart);
});
Do you know how can I achieve that with my current forEach
loop?您知道如何使用当前的forEach
循环实现这一目标吗?
Any help is greatly appreciated.任何帮助是极大的赞赏。 Thanks谢谢
Use a regular expression to match non- /
s that come after the .co/
:使用正则表达式匹配.co/
之后的非/
s :
var urlList = [ { "URL": "https://www.example.co/cat1/aa/bb/cc", "Last crawled": "Jun 23, 2019" }, { "URL": "https://www.example.co/cat2/aa", "Last crawled": "Jun 23, 2019" }, { "URL": "https://www.example.co/cat1/aa/bb/cc/dd/ee", "Last crawled": "Jun 23, 2019" }, { "URL": "https://www.example.co/cat3/aa/bb/cc/", "Last crawled": "Jun 23, 2019" }, { "URL": "https://www.example.co/cat2/aa/bb", "Last crawled": "Jun 23, 2019" }, { "URL": "https://www.example.co/cat1/aa/bb", "Last crawled": "Jun 23, 2019" }, { "URL": "https://www.example.co/xmlrpc.php", "Last crawled": "Jun 19, 2019" } ] const paths = urlList.map( ({ URL }) => URL.match(/\\.co\\/([^\\/]+)/)[1] ); console.log(paths); const counts = paths.reduce((a, str) => { a[str] = (a[str] || 0) + 1; return a; }, {}); console.log(counts);
On newer engines, you can use lookbehind instead of extracting the capture group:在较新的引擎上,您可以使用后视而不是提取捕获组:
const paths = urlList.map(
({ URL }) => URL.match(/(?<=\.co\/)[^\/]+/)[0]
);
If you want to keep track of all full URLs used, reduce not only into a count, but also into an array of those full URLs:如果要跟踪使用的所有完整 URL,不仅要减少计数,还要减少这些完整 URL 的数组:
var urlList = [ { "URL": "https://www.example.co/cat1/aa/bb/cc", "Last crawled": "Jun 23, 2019" }, { "URL": "https://www.example.co/cat2/aa", "Last crawled": "Jun 23, 2019" }, { "URL": "https://www.example.co/cat1/aa/bb/cc/dd/ee", "Last crawled": "Jun 23, 2019" }, { "URL": "https://www.example.co/cat3/aa/bb/cc/", "Last crawled": "Jun 23, 2019" }, { "URL": "https://www.example.co/cat2/aa/bb", "Last crawled": "Jun 23, 2019" }, { "URL": "https://www.example.co/cat1/aa/bb", "Last crawled": "Jun 23, 2019" }, { "URL": "https://www.example.co/xmlrpc.php", "Last crawled": "Jun 19, 2019" } ] const getSecond = url => url.match(/\\.co\\/([^\\/]+)/)[1]; const counts = urlList.reduce((a, { URL }) => { const second = getSecond(URL); if (!a[second]) { a[second] = { count: 0, fullUrls: [] }; } a[second].count++; a[second].fullUrls.push(URL); return a; }, {}); console.log(counts);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.