cestoliv, il y a 2 ans - lun. 14 nov. 2022
"๐ฉโ๐ป๐".length = 7 ??? How to count emojis with Javascript
In this article
- The problem
- A first solution with the spread operator
- A second algorithm with Zero Width Joiner
- The best solution (to use in production)
1. The problem
This week, a friend of mine encountered a Javascript problem when he wanted to check that his user was entering only one character in a text input. Indeed, the first solution we think of is to look at the length of the string, but problems occur when this string contains emojis:
"a".length // => 1
"๐".length // => 2 ??
In fact, it is quite logical, knowing that the .length
function in Javascript returns the length of the string in UTF-16 code units, not the number of visible characters.
2. A first solution with the spread operator
The first solution I thought of was to split the string on each character and then get the number of elements:
"๐".split('') // => ["๏ฟฝ","๏ฟฝ"]
"๐".split('').length // => 2
Ouch... Unfortunately, .split('')
also splits in UTF-16 code units.
But there is another way to split a string on each character in Javascript, using the spread operator:
[..."๐"] // => ["๐"]
[..."๐"].length // => 1, Hooray !!
[..."๐๐"] // => ["๐", "๐"]
[..."๐๐"].length // => 2, Hooray !!
[..."๐ฉโ๐ป"] // => ["๐ฉ"โ, "\u{200D}", "๐ป"]
[..."๐ฉโ๐ป"].length // => 3, Oops...
Damn... Still not, unfortunately for us, some emojis are composed of several emojis, separated by a "โ" (U+200D, a Zero Width Joiner):
[..."๐ฉโ๐ป"] // => ["๐ฉ"โ, "\u{200D}", "๐ป"]
[..."๐ฉโ๐ป๐ฉโโค๏ธโ๐โ๐ฉ"] // => ["๐ฉ", "\u{200D}", "๐ป", "๐ฉ", "\u{200D}", "โค", "\u{fe0f}", "\u{200D}", "๐", "\u{200D}", "๐ฉ"]
3. A second algorithm with Zero Width Joiner
As you can see in this example, to count the number of visible characters, you can count the number of times two characters that are NOT Zero Width Joiner are side by side.
For example:
[..."a๐ฉโ๐ป๐"]
// => ["a", "๐ฉ", "\u{200D}", "๐ป", "๐"]
// | 1 | 2 | 3 |
So we can make it a simple function:
function visibleLength(str) {
let count = 0;
let arr = [...str];
for (c = 0; c < arr.length; c++) {
if (
arr[c] != '\u{200D}' &&
arr[c + 1] != '\u{200D}' &&
arr[c + 1] != '\u{fe0f}' &&
arr[c + 1] != '\u{20e3}'
) {
count++;
}
}
return count;
}
visibleLength('Hello World'); // => 11
visibleLength('Hello World ๐'); // => 13
visibleLength("I'm going to ๐ !"); // 16
visibleLength('๐จโ๐ฉโ๐งโ๐ฆ'); // => 1
visibleLength('๐ฉโ๐ป๐ฉโโค๏ธโ๐โ๐ฉ'); // => 2
visibleLength('๐ซ๐ท'); // => 2 AAAAAAAAAAAAAA!!!
Our function works in many cases, but not in the case of flags, because flags are two emojis-letters put side by side, but they are not separated by a Zero Width Joiner, they are simply transformed into flags by the supported platforms.
[..."๐ซ๐ท"] // => ["๐ซ", "๐ท"]
[..."๐บ๐ธ"] // => ["๐บ", "๐ธ"]
4. The best solution (to use in production)
One of the best solutions we have to handle all these cases is to use a Grapheme algorithm capable of separating strings into visible phrases, words or characters.
To our delight, Javascript integrates this algorithm natively: Intl.Segmenter
It's pretty easy to use, AND IT WORKS WITH ALL CHARACTERS!:
function visibleLength(str) {
return [...new Intl.Segmenter().segment(str)].length
}
visibleLength("I'm going to ๐ !") // => 16
visibleLength("๐ฉโ๐ป") // => 1
visibleLength("๐ฉโ๐ป๐ฉโโค๏ธโ๐โ๐ฉ") // => 2
visibleLength("France ๐ซ๐ท!") // => 9
visibleLength("England ๐ด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ!") // => 10
visibleLength("ใจๆฅๆฌ่ชใฎๆ็ซ ") // => 7
There is just one small problem, Intl.Segmenter()
is not compatible at all with Firefox (both Desktop and Mobile) and Internet Explorer.
4.1 Use it on Firefox/IE
To make this solution compatible with Firefox, we need to use this polyfill: https://github.com/surferseo/intl-segmenter-polyfill
Because the file is very large (1.77 MB), we need to make sure that it is only loaded for clients that do not yet support Intl.Segmenter()
.
Because it's a bit out of the scope of this article, I'm just posting my solution:
4.2. Use it with TypeScript
Because this implementation is relatively new, if you want to use Intl.Segmenter()
with TypeScript, make sure you have at least ES2022 as target:
// File: tsconfig.json
{
"compilerOptions": {
"target": "ES2022",
// [...]
}
// [...]
}
Thanks for reading!