What is the most professional way to obtain a case insensitive count of the distinct words contained in an array using plain javascript? I have done the first attempt myself but does not feel much professional.
I would like the result to be a Map
You can use Array.reduce
to store each word as a property and the occurrence of each as the value.
In the reducer function, check whether the letter (converted to lowercase) exists as a property. If not, set its value to 1
. Otherwise, increment the property value.
const arr = ["a", "A", "b", "B"] const result = arr.reduce((a,b) => { let c = b.toLowerCase(); return a[c] = a[c] ? ++a[c] : 1, a; }, {}) console.log(result)
const result = arr.reduce((a,b) => (c = b.toLowerCase(), a[c] = a[c] ? ++a[c] : 1, a), {})
To convert it to a Map
, you can use Object.entries
(sugged by @Théophile ):
const arr = ["a", "A", "b", "B"] const result = arr.reduce((a, b) => { let c = b.toLowerCase(); return a[c] = a[c] ? ++a[c] : 1, a; }, {}) const m = new Map(Object.entries(result)) m.forEach((value, key) => console.log(key, ':', value))
You can use an object to store the results and then create a Map
object by passing that object to Object.entries
const arr = ["c", "A", "C", "B", "b"]; const counts = {}; for (const el of arr) { let c = el.toLowerCase(); counts[c] = counts[c] ? ++counts[c] : 1; } console.log(counts); const map = new Map(Object.entries(counts)) map.forEach((k,v) => console.log(k,v))
While the accepted 'group-by' operation is fine, it doesn't address the complexity of case-insensitive/unicode comparison.
First of all, you can reduce directly into a Map , here counting characters as they are without accounting for case-insensitivity or unicode variations resulting in 20 'distinct' characters being counted from an array of length 24.
const input = [ 'a', 'A', 'b', 'B', '\ñ', '\n\̃', 'İ', 'i', 'Gesäß', 'GESÄSS', '\Ι', '\ι', '\å', '\Å', '\Å', '\Å', '\Ι', '\ι', '\ι', '\ι', '\β', '\ϐ', '\ε', '\ϵ', ]; const result = input.reduce((a, b) => a.set(b, (a.get(b) ?? 0) + 1), new Map()); console.log('distinct count: ', result.size); console.log('Map(',result.size,') {', [...result.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
Based on the samples below, the method that results in the most compact count is using word.normalize().toLocaleUpperCase()
and passing Turkey('tr') as a locale for this specific sample array. It results in 9 'distinct' characters being counted from an array of length 24, properly handling different encodings for ñ
, equivalent spellings of Gesäß
( GESÄSS
), and accounting for locale specific case changes ( i
to İ
)
const input = [ 'a', 'A', 'b', 'B', '\ñ', '\n\̃', 'İ', 'i', 'Gesäß', 'GESÄSS', '\Ι', '\ι', '\å', '\Å', '\Å', '\Å', '\Ι', '\ι', '\ι', '\ι', '\β', '\ϐ', '\ε', '\ϵ', ]; const result_normalize_locale = input.reduce((a, b) => { const w = b.normalize().toLocaleUpperCase('tr'); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); console.log('distinct count: ', result_normalize_locale.size); console.log('Map(',result_normalize_locale.size,') {', [...result_normalize_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
Using this simple 'group-by' we can look at the variations between the available case comparison methods: toLowerCase()
, toLocaleLowerCase()
, toUpperCase()
, and toLocaleUpperCase()
and unicode variations can be accounted for using normalize()
To lower case
toLowerCase()
– 15 'distinct' characters.
toLocaleLowerCase()
– 14 'distinct' characters, in this case specifying Turkey('tr') as locale.
normalize().toLocaleLowerCase()
– 12 'distinct' characters, again with 'tr' as locale.
const input = [ 'a', 'A', 'b', 'B', '\ñ', '\n\̃', 'İ', 'i', 'Gesäß', 'GESÄSS', '\Ι', '\ι', '\å', '\Å', '\Å', '\Å', '\Ι', '\ι', '\ι', '\ι', '\β', '\ϐ', '\ε', '\ϵ', ]; // ['a', 'A', 'b', 'B', 'ñ', 'ñ', 'İ', 'i', 'Gesäß', 'GESÄSS', 'Ι', 'ι', 'å', 'Å', 'Å', 'Å', 'Ι', 'ι', 'ι', 'ι', 'β', 'ϐ', 'ε', 'ϵ', ] // input.length: 24 // grouping by toLowerCase() const result = input.reduce((a, b) => { const w = b.toLowerCase(); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); // grouping by toLocaleLowerCase('tr') [Turkey] const result_locale = input.reduce((a, b) => { const w = b.toLocaleLowerCase('tr'); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); // grouping by normalize().toLocaleLowerCase('tr') [Turkey] const result_normalize_locale = input.reduce((a, b) => { const w = b.normalize().toLocaleLowerCase('tr'); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); // log toLowerCase() result - 15 'distinct' characters console.log('toLowerCase() '); console.log('distinct count: ', result.size); console.log('Map(',result.size,') {', [...result.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}'); // log toLocaleLowerCase('tr') result - 14 'distinct' characters console.log("\\ntoLocaleLowerCase('tr')"); console.log('distinct count: ', result_locale.size); console.log('Map(',result_locale.size,') {', [...result_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}'); // log normalize().toLocaleLowerCase('tr') result - 12 'distinct' characters console.log("\\nnormalize().toLocaleLowerCase('tr')"); console.log('distinct count: ', result_normalize_locale.size); console.log('Map(',result_normalize_locale.size,') {', [...result_normalize_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
.as-console-wrapper { max-height: 100% !important; top: 0; }
To upper case
toUpperCase()
– 12 'distinct' characters.
toLocaleUpperCase()
– 11 'distinct' characters, in this case specifying Turkey('tr') as locale.
normalize().toLocaleUpperCase()
– 9 'distinct' characters, again with 'tr' as locale.
const input = [ 'a', 'A', 'b', 'B', '\ñ', '\n\̃', 'İ', 'i', 'Gesäß', 'GESÄSS', '\Ι', '\ι', '\å', '\Å', '\Å', '\Å', '\Ι', '\ι', '\ι', '\ι', '\β', '\ϐ', '\ε', '\ϵ', ]; // ['a', 'A', 'b', 'B', 'ñ', 'ñ', 'İ', 'i', 'Gesäß', 'GESÄSS', 'Ι', 'ι', 'å', 'Å', 'Å', 'Å', 'Ι', 'ι', 'ι', 'ι', 'β', 'ϐ', 'ε', 'ϵ', ] // input.length: 24 // grouping by toUpperCase() const result = input.reduce((a, b) => { const w = b.toUpperCase(); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); // grouping by toLocaleUpperCase('tr') [Turkey] const result_locale = input.reduce((a, b) => { const w = b.toLocaleUpperCase('tr'); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); // grouping by normalize().toLocaleUpperCase('tr') [Turkey] const result_normalize_locale = input.reduce((a, b) => { const w = b.normalize().toLocaleUpperCase('tr'); return a.set(w, (a.get(w) ?? 0) + 1); }, new Map()); // log toUpperCase() result - 12 'distinct' characters console.log('toUpperCase() '); console.log('distinct count: ', result.size); console.log('Map(',result.size,') {', [...result.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}'); // log toLocaleUpperCase('tr') result - 11 'distinct' characters console.log("\\ntoLocaleUpperCase('tr')"); console.log('distinct count: ', result_locale.size); console.log('Map(',result_locale.size,') {', [...result_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}'); // log normalize().toLocaleUpperCase('tr') result - 9 'distinct' characters console.log("\\nnormalize().toLocaleUpperCase('tr')"); console.log('distinct count: ', result_normalize_locale.size); console.log('Map(',result_normalize_locale.size,') {', [...result_normalize_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
.as-console-wrapper { max-height: 100% !important; top: 0; }
use set to get rid of duplicates and the spread operator to put it back in an array.
const myarray = ['one', 'One', 'two', 'TWO', 'three'];
const noDupes = [... new Set( myarray.map(x => x.toLowerCase()))];
console.log(noDupes);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.