cestoliv, il y a 1 an - lun. 14 nov. 2022

"๐Ÿ‘ฉโ€๐Ÿ’ป๐ŸŽ‰".length = 7 ??? How to count emojis with Javascript

In this article

  1. The problem
  2. A first solution with the spread operator
  3. A second algorithm with Zero Width Joiner
  4. The best solution (to use in production)
    1. Use it on Firefox/IE
    2. Use it with TypeScript

1. The problem

This week, a friend of mine encountered a Javascript problem when he wanted to check that his user was entering only one character in a text input. Indeed, the first solution we think of is to look at the length of the string, but problems occur when this string contains emojis:

"a".length // => 1
"๐Ÿ›".length // => 2 ??

Houston, we have a problem meme

In fact, it is quite logical, knowing that the .length function in Javascript returns the length of the string in UTF-16 code units, not the number of visible characters.

2. A first solution with the spread operator

The first solution I thought of was to split the string on each character and then get the number of elements:

"๐Ÿ›".split('') // => ["๏ฟฝ","๏ฟฝ"]
"๐Ÿ›".split('').length // => 2

Ouch... Unfortunately, .split('') also splits in UTF-16 code units.

But there is another way to split a string on each character in Javascript, using the spread operator:

[..."๐Ÿ›"] // => ["๐Ÿ›"]
[..."๐Ÿ›"].length // => 1, Hooray !!
[..."๐Ÿ›๐ŸŽ‰"] // => ["๐Ÿ›", "๐ŸŽ‰"]
[..."๐Ÿ›๐ŸŽ‰"].length // => 2, Hooray !!

[..."๐Ÿ‘ฉโ€๐Ÿ’ป"] // => ["๐Ÿ‘ฉ"โ€, "\u{200D}", "๐Ÿ’ป"]
[..."๐Ÿ‘ฉโ€๐Ÿ’ป"].length // => 3, Oops...

Damn... Still not, unfortunately for us, some emojis are composed of several emojis, separated by a "โ€" (U+200D, a Zero Width Joiner):

[..."๐Ÿ‘ฉโ€๐Ÿ’ป"] // => ["๐Ÿ‘ฉ"โ€, "\u{200D}", "๐Ÿ’ป"]

[..."๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘ฉโ€โค๏ธโ€๐Ÿ’‹โ€๐Ÿ‘ฉ"] // => ["๐Ÿ‘ฉ", "\u{200D}", "๐Ÿ’ป", "๐Ÿ‘ฉ", "\u{200D}", "โค", "\u{fe0f}", "\u{200D}", "๐Ÿ’‹", "\u{200D}", "๐Ÿ‘ฉ"]

3. A second algorithm with Zero Width Joiner

As you can see in this example, to count the number of visible characters, you can count the number of times two characters that are NOT Zero Width Joiner are side by side.

For example:

[..."a๐Ÿ‘ฉโ€๐Ÿ’ป๐ŸŽ‰"]
// => ["a", "๐Ÿ‘ฉ", "\u{200D}", "๐Ÿ’ป", "๐ŸŽ‰"]
//    | 1 |           2           |  3  |

So we can make it a simple function:

function visibleLength(str) {
    let count = 0;
    let arr = [...str];

    for (c = 0; c < arr.length; c++) {
        if (
            arr[c] != '\u{200D}' &&
            arr[c + 1] != '\u{200D}' &&
            arr[c + 1] != '\u{fe0f}' &&
            arr[c + 1] != '\u{20e3}'
        ) {
            count++;
        }
    }
    return count;
}

visibleLength('Hello World'); // => 11
visibleLength('Hello World ๐Ÿ‘‹'); // => 13
visibleLength("I'm going to ๐Ÿ› !"); // 16
visibleLength('๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ'); // => 1
visibleLength('๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘ฉโ€โค๏ธโ€๐Ÿ’‹โ€๐Ÿ‘ฉ'); // => 2

visibleLength('๐Ÿ‡ซ๐Ÿ‡ท'); // => 2 AAAAAAAAAAAAAA!!!

Our function works in many cases, but not in the case of flags, because flags are two emojis-letters put side by side, but they are not separated by a Zero Width Joiner, they are simply transformed into flags by the supported platforms.

[..."๐Ÿ‡ซ๐Ÿ‡ท"] // => ["๐Ÿ‡ซ", "๐Ÿ‡ท"]
[..."๐Ÿ‡บ๐Ÿ‡ธ"] // => ["๐Ÿ‡บ", "๐Ÿ‡ธ"]

4. The best solution (to use in production)

One of the best solutions we have to handle all these cases is to use a Grapheme algorithm capable of separating strings into visible phrases, words or characters.

To our delight, Javascript integrates this algorithm natively: Intl.Segmenter

It's pretty easy to use, AND IT WORKS WITH ALL CHARACTERS!:

function visibleLength(str) {
    return [...new Intl.Segmenter().segment(str)].length
}

visibleLength("I'm going to ๐Ÿ› !") // => 16
visibleLength("๐Ÿ‘ฉโ€๐Ÿ’ป") // => 1
visibleLength("๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘ฉโ€โค๏ธโ€๐Ÿ’‹โ€๐Ÿ‘ฉ") // => 2
visibleLength("France ๐Ÿ‡ซ๐Ÿ‡ท!") // => 9
visibleLength("England ๐Ÿด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ!") // => 10
visibleLength("ใจๆ—ฅๆœฌ่ชžใฎๆ–‡็ซ ") // => 7

There is just one small problem, Intl.Segmenter() is not compatible at all with Firefox (both Desktop and Mobile) and Internet Explorer.

Mozilla Developer Network screenshot of Intl Segmenter Browser compatibility

4.1 Use it on Firefox/IE

To make this solution compatible with Firefox, we need to use this polyfill: https://github.com/surferseo/intl-segmenter-polyfill

Because the file is very large (1.77 MB), we need to make sure that it is only loaded for clients that do not yet support Intl.Segmenter().

Because it's a bit out of the scope of this article, I'm just posting my solution:

4.2. Use it with TypeScript

Because this implementation is relatively new, if you want to use Intl.Segmenter() with TypeScript, make sure you have at least ES2022 as target:

// File: tsconfig.json
{
  "compilerOptions": {
    "target": "ES2022",
    // [...]
  }
  // [...]
}

Thanks for reading!