How to iterate over over all Unicode characters?

Question

Is it possible to iterate over all Unicode characters (UTF-8)? Thanks! I've tried using:

character = String.fromCharCode(i);

But I'm not sure how to implement it.

Answer 1

UTF-8 is an encoding! JavaScript strings are (mostly) encoded in UTF-16. Encoding is only important if you're working in an environment that doesn't support ES6's String.fromCodePoint . Getting a string from a codepoint with ES6:

var s = String.fromCodePoint(codePoint);

and without ES6, using a UTF-16 surrogate pair for characters U+10000 and onwards:

var s;

if (codePoint < 0x10000) {
    s = String.fromCharCode(codePoint);
} else {
    var offset = codePoint - 0x10000;
    s = String.fromCharCode(0xd800 + (offset >> 10),
                            0xdc00 + (offset & 0x3ff));
}

Codepoints range from U+0000 to U+10FFFF (1 114 112 values), but not everything that range is a valid Unicode character. You can get a table from http://www.unicode.org/Public/8.0.0/ucd/UnicodeData.txt and extract the characters you really want to iterate over.

Answer 2

According to the docs , the parameter passed to String.fromCharCode(a) is converted calling ToUint16 and then said character is returned. You may call it with any number you want but the values will be capped to between 0 and 2 ¹⁶ or 2 ³²

highNumber = 500; //This could go very high
out = ""
for(i=0;i<highNumber;i++){
    out += String.fromCharCode(i);
}
console.log(out);

Danger note if you run this code using 2^16 you may freeze your tab or browser, it's way too big. This is understanding you want to iterate over all characters and not all characters in a given string which is quite a different thing.

A sample output of a more reasonable highNumber (ie 500) is the following:

 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
stuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæç
èéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺ
ĻļĽľĿŀŁłŃńŅņŇňŉŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍ
ƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃǄǅǆǇǈǉǊǋǌǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠ
ǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰǱǲǳ

Answer 3

(Adding this answer because relevant for some Google searches)

The correct way to iterate character by character over a string that may contain UTF-8 multi-codepoint characters (ie emojis or non-latin alphabets) is Array.from() :

const bugs = '🐛🐛🐛'

// WRONG, does not account for characters with > 2 Unicode code points
bugs.split('')
// Array(6) [ "\ud83d", "\udc1b", "\ud83d", "\udc1b", "\ud83d", "\udc1b" ]

// CORRECT
Array.from(bugs)
// Array(3) [ "🐛", "🐛", "🐛" ]

Then, iterate as you may iterate any normal array (suggested: map / forEach ).

More information: https://medium.com/@giltayar/iterating-over-emoji-characters-the-es6-way-f06e4589516

Answer 4

I think this might define what to iterate over exactly:

Answer 5

A Javascript string has a length property. You can iterate over the characters simply:

for(var i = 0; i < str.length; i++) {
    var char = str[i],
       code = str.charCodeAt(i);
}

How to iterate over over all Unicode characters?

Question

5 answers

solution1
8 2015-11-18 23:10:57

solution2
3 ACCPTED 2015-11-18 23:05:46

solution3
2 2020-12-01 17:09:34

solution4
0 2019-06-09 20:55:57

solution5
-3 2015-11-18 23:02:27

How to iterate over over all Unicode characters?

Question

5 answers

solution1 8 2015-11-18 23:10:57

solution2 3 ACCPTED 2015-11-18 23:05:46

solution3 2 2020-12-01 17:09:34

solution4 0 2019-06-09 20:55:57

solution5 -3 2015-11-18 23:02:27

solution1
8 2015-11-18 23:10:57

solution2
3 ACCPTED 2015-11-18 23:05:46

solution3
2 2020-12-01 17:09:34

solution4
0 2019-06-09 20:55:57

solution5
-3 2015-11-18 23:02:27