简体   繁体   中英

C++ implementation of python unicodedata library

New user here, please be gentle.

we are looking to implement a piece of python code in c++, but it involves some intricate unicode library called unicodedata, in particular this function

unicodedata.category('A')  # 'L'etter, 'u'ppercase
'Lu'

Any chance that this can be readily achieved in c++? Would embedding compiled python code in c++ be worthwhile, assuming we want to do this in the context of online tensorflow model serving? Thanks!

Just stick the output of this Python code into a C++ source file:

import unicodedata

print('typedef enum {Cn, Cc, Cf, Co, Cs, Ll, Lm, Lo, Lt, Lu, Mc, Me, Mn, Nd, Nl, No, Pc, Pd, Pe, Pf, Pi, Po, Ps, Sc, Sk, Sm, So, Zl, Zp, Zs} CATEGORY_e;')
print('const CATEGORY_e CHAR_CATEGORIES[] = {%s};' % ', '.join(unicodedata.category(chr(codepoint)) for codepoint in range(0x110000)))

(If you are still using Python 2.x instead of 3.x, replace chr with unichr .)

You now have a convenient lookup table of Unicode character categories to use in your C++ programs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM