简体   繁体   中英

count of word occurrences in a list, case insensitive

What is the most professional way to obtain a case insensitive count of the distinct words contained in an array using plain javascript? I have done the first attempt myself but does not feel much professional.

I would like the result to be a Map

You can use Array.reduce to store each word as a property and the occurrence of each as the value.

In the reducer function, check whether the letter (converted to lowercase) exists as a property. If not, set its value to 1 . Otherwise, increment the property value.

 const arr = ["a", "A", "b", "B"] const result = arr.reduce((a,b) => { let c = b.toLowerCase(); return a[c] = a[c] ? ++a[c] : 1, a; }, {}) console.log(result)
As a one liner: const result = arr.reduce((a,b) => (c = b.toLowerCase(), a[c] = a[c] ? ++a[c] : 1, a), {})

To convert it to a Map , you can use Object.entries (sugged by @Théophile ):

 const arr = ["a", "A", "b", "B"] const result = arr.reduce((a, b) => { let c = b.toLowerCase(); return a[c] = a[c] ? ++a[c] : 1, a; }, {}) const m = new Map(Object.entries(result)) m.forEach((value, key) => console.log(key, ':', value))

You can use an object to store the results and then create a Map object by passing that object to Object.entries

 const arr = ["c", "A", "C", "B", "b"]; const counts = {}; for (const el of arr) { let c = el.toLowerCase(); counts[c] = counts[c] ? ++counts[c] : 1; } console.log(counts); const map = new Map(Object.entries(counts)) map.forEach((k,v) => console.log(k,v))

While the accepted 'group-by' operation is fine, it doesn't address the complexity of case-insensitive/unicode comparison.

First of all, you can reduce directly into a Map , here counting characters as they are without accounting for case-insensitivity or unicode variations resulting in 20 'distinct' characters being counted from an array of length 24.

 const input = [ 'a', 'A', 'b', 'B', '\ñ', '\n\̃', 'İ', 'i', 'Gesäß', 'GESÄSS', '\Ι', '\ι', '\å', '\Å', '\Å', '\Å', '\Ι', '\ι', '\ι', '\ι', '\β', '\ϐ', '\ε', '\ϵ', ]; const result = input.reduce((a, b) => a.set(b, (a.get(b) ?? 0) + 1), new Map()); console.log('distinct count: ', result.size); console.log('Map(',result.size,') {', [...result.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');

Based on the samples below, the method that results in the most compact count is using word.normalize().toLocaleUpperCase() and passing Turkey('tr') as a locale for this specific sample array. It results in 9 'distinct' characters being counted from an array of length 24, properly handling different encodings for ñ , equivalent spellings of Gesäß ( GESÄSS ), and accounting for locale specific case changes ( i to İ )

 const input = [ 'a', 'A', 'b', 'B', '\ñ', '\n\̃', 'İ', 'i', 'Gesäß', 'GESÄSS', '\Ι', '\ι', '\å', '\Å', '\Å', '\Å', '\Ι', '\ι', '\ι', '\ι', '\β', '\ϐ', '\ε', '\ϵ', ]; const result_normalize_locale = input.reduce((a, b) => { const w = b.normalize().toLocaleUpperCase('tr'); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); console.log('distinct count: ', result_normalize_locale.size); console.log('Map(',result_normalize_locale.size,') {', [...result_normalize_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');

Using this simple 'group-by' we can look at the variations between the available case comparison methods: toLowerCase() , toLocaleLowerCase() , toUpperCase() , and toLocaleUpperCase() and unicode variations can be accounted for using normalize()

To lower case

toLowerCase() – 15 'distinct' characters.

toLocaleLowerCase() – 14 'distinct' characters, in this case specifying Turkey('tr') as locale.

normalize().toLocaleLowerCase() – 12 'distinct' characters, again with 'tr' as locale.

 const input = [ 'a', 'A', 'b', 'B', '\ñ', '\n\̃', 'İ', 'i', 'Gesäß', 'GESÄSS', '\Ι', '\ι', '\å', '\Å', '\Å', '\Å', '\Ι', '\ι', '\ι', '\ι', '\β', '\ϐ', '\ε', '\ϵ', ]; // ['a', 'A', 'b', 'B', 'ñ', 'ñ', 'İ', 'i', 'Gesäß', 'GESÄSS', 'Ι', 'ι', 'å', 'Å', 'Å', 'Å', 'Ι', 'ι', 'ι', 'ι', 'β', 'ϐ', 'ε', 'ϵ', ] // input.length: 24 // grouping by toLowerCase() const result = input.reduce((a, b) => { const w = b.toLowerCase(); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); // grouping by toLocaleLowerCase('tr') [Turkey] const result_locale = input.reduce((a, b) => { const w = b.toLocaleLowerCase('tr'); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); // grouping by normalize().toLocaleLowerCase('tr') [Turkey] const result_normalize_locale = input.reduce((a, b) => { const w = b.normalize().toLocaleLowerCase('tr'); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); // log toLowerCase() result - 15 'distinct' characters console.log('toLowerCase() '); console.log('distinct count: ', result.size); console.log('Map(',result.size,') {', [...result.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}'); // log toLocaleLowerCase('tr') result - 14 'distinct' characters console.log("\\ntoLocaleLowerCase('tr')"); console.log('distinct count: ', result_locale.size); console.log('Map(',result_locale.size,') {', [...result_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}'); // log normalize().toLocaleLowerCase('tr') result - 12 'distinct' characters console.log("\\nnormalize().toLocaleLowerCase('tr')"); console.log('distinct count: ', result_normalize_locale.size); console.log('Map(',result_normalize_locale.size,') {', [...result_normalize_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
 .as-console-wrapper { max-height: 100% !important; top: 0; }

To upper case

toUpperCase() – 12 'distinct' characters.

toLocaleUpperCase() – 11 'distinct' characters, in this case specifying Turkey('tr') as locale.

normalize().toLocaleUpperCase() – 9 'distinct' characters, again with 'tr' as locale.

 const input = [ 'a', 'A', 'b', 'B', '\ñ', '\n\̃', 'İ', 'i', 'Gesäß', 'GESÄSS', '\Ι', '\ι', '\å', '\Å', '\Å', '\Å', '\Ι', '\ι', '\ι', '\ι', '\β', '\ϐ', '\ε', '\ϵ', ]; // ['a', 'A', 'b', 'B', 'ñ', 'ñ', 'İ', 'i', 'Gesäß', 'GESÄSS', 'Ι', 'ι', 'å', 'Å', 'Å', 'Å', 'Ι', 'ι', 'ι', 'ι', 'β', 'ϐ', 'ε', 'ϵ', ] // input.length: 24 // grouping by toUpperCase() const result = input.reduce((a, b) => { const w = b.toUpperCase(); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); // grouping by toLocaleUpperCase('tr') [Turkey] const result_locale = input.reduce((a, b) => { const w = b.toLocaleUpperCase('tr'); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); // grouping by normalize().toLocaleUpperCase('tr') [Turkey] const result_normalize_locale = input.reduce((a, b) => { const w = b.normalize().toLocaleUpperCase('tr'); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); // log toUpperCase() result - 12 'distinct' characters console.log('toUpperCase() '); console.log('distinct count: ', result.size); console.log('Map(',result.size,') {', [...result.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}'); // log toLocaleUpperCase('tr') result - 11 'distinct' characters console.log("\\ntoLocaleUpperCase('tr')"); console.log('distinct count: ', result_locale.size); console.log('Map(',result_locale.size,') {', [...result_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}'); // log normalize().toLocaleUpperCase('tr') result - 9 'distinct' characters console.log("\\nnormalize().toLocaleUpperCase('tr')"); console.log('distinct count: ', result_normalize_locale.size); console.log('Map(',result_normalize_locale.size,') {', [...result_normalize_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
 .as-console-wrapper { max-height: 100% !important; top: 0; }

use set to get rid of duplicates and the spread operator to put it back in an array.

const  myarray = ['one', 'One', 'two', 'TWO', 'three'];
const noDupes = [... new Set( myarray.map(x => x.toLowerCase()))];
console.log(noDupes);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM