How to manipulate UTF-8 on the Moon

Question :

How to work with a composite UTF-8 encoded string on the Moon?

As for example:

  • get one-character code in a string by its index;
  • encode character codes, something like string.char(...códigos) or ('').char .

What are the possible ways?


Answer :

Being version 5.3 (although I do not understand much) you can use the module utf8 and will have the functions:

There is also a starwing / luautf8 module that makes it possible to have some extra functions (the author claims to have tested with Moon 5.2.3 , Moon 5.3.0 and LuaJIT).

To install, use the command (if you have luarocks):

luarocks install luautf8

And so call your script to avoid conflict with native functions:

local utf8 = require 'lua-utf8'

If you do not have luarocks you can try to manually compile this file link .

Some functions are utf8.byte , utf8.char , utf8.find , utf8.gmatch , utf8.gsub , utf8.len , utf8.lower , utf8.match , utf8.reverse , utf8.sub and utf8.upper .


To add, in version 5.3 there is a special syntax for coding the code of a character (almost equal to utf8.char):

local chr = 'u{código}';

code : a hexadecimal code.

The difference is that the syntax only encodes a character at a time and only works with strings where the character represents escape .

And here comes another utf8 library on GitHub that does not have to be natively compiled or installed.


