Check against Unicode allowlist
S This expression takes a string and a Unicode allowlist, and returns the first UTF-8 code point that violates the allowlist, if any.
It takes two string parameters:
- The text to convert to UTF-8 and test against the passed allowlist.
- The Unicode allowlist, or the allowlist name.
The name of the allowlist is the same as you use when calling the (Advanced) Set Unicode allowlist action:
"client names", "channel names", "received by client", or "received by server".
When the tested string matches the allowlist, the expression will return blank "".
Otherwise, the returned text will be along the format of:
Code point at index X does not match allowed list. Code point U+XXXX, decimal XX; valid = XX, Unicode category = XX.
For example, when testing string "Foobar1" with "L*" allowlist (all letter categories), the return will be
Code point at index 6 does not match allowed list. Code point U+0031, decimal 49; valid = yes, Unicode category = Nd.
Breaking down the return text:
- Index 6 is the code point index of the failed character, the first character being 0, not the byte index.
- Code point U+0031 is the UTF-8 hex representation of the character as it will appear on most Unicode websites (e.g. this one), and in other coding languages, the character can be escaped in a "\u0031" sort of format.
- Decimal 49 is the decimal value of the Unicode code point.
- Valid = yes indicates the Unicode code point that does not match the list is still a valid Unicode code point.
- Unicode category = Nd indicates the non-matching code point is a Number, Decimal Digit classification.
Only the first non-matching code point will be returned.
For more details on UTF-8 terms, read the Unicode notes topic. Allowlists are also explained there.