简体   繁体   中英

how to count number of japanese and latin characters in a string?

I need to write a function that properly follows Japanese counting of characters and therefore returns length of characters in a string given these conditions:

  • 1 for Full-width char (Japanese kanji, katakana, and hiragana)
  • 0.5 for Half-width char (0-9, AZ).

here is my unit test I wrote:

 describe('#getCaptionLength', () => { it('should return correct caption length for japanase', () => { const text = 'を取り外すコネクタと考えてください'; const result = getCaptionLength(text); expect(result).toBe(17) }); it('should return correct caption length for japanase mixed with latin', () => { const text = 'を取り外すコネクタと考えてください hello world'; const result = getCaptionLength(text); expect(result).toBe(17 + 6); }); });

Can you please help me write this function that would pass my unit test? Thanks!

It's not very complicated if you want to just pass these two tests you could write something like this:

 function getCaptionLength(text) { // find all latin characters with RexExp let re = new RegExp('[A-Za-z0-9 ]+', 'g'); let found = text.match(re); // get length of latin part let latinLength = found? found.join('').length: 0; // japanese part is just full string - latin part let japaneseCharactersLength = text.length - latinLength; // calculate and return the final result return japaneseCharactersLength + latinLength * 0.5; }

But this of course would also count as 1 everytihng that is not either Japanese character OR latin character such as emoji, special characters and what not.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM