Rebol: Table of useful charset

Wednesday, 7 November 2012

Here a table of useful charsets:

NUMBERS	`chars-n: charset [#"0" - #"9"]`
LOWERCASE	`chars-la: charset [#"a" - #"z"]`
UPPERCASE	`chars-ua: charset [#"A" - #"Z"]`
LETTERS	`chars-a: union chars-la chars-ua`
LETTERS and NUMBERS	`chars-an: union chars-a chars-n`
HEXADECIMAL VALUES	`chars-hx: union chars-n charset [#"A" - #"F" #"a" - #"f"]`
URL DECODE	`chars-ud: union chars-an charset "*-._!~',"`
URL	`chars-u: union chars-ud charset ":+%&=?"`
ID	`chars-id: union chars-n union chars-la charset "-_"`
word-first-letter	`chars-w1: union chars-a charset "*-._!+?&\|"`
WORD	`chars-w*: union chars-w1 chars-n`
FILE	`chars-f: insert copy chars-an #"-"`
PATH	`chars-p: union chars-an charset "-_!+%"`
SPACE	`chars-sp: charset " ^-"`
ABOVE ASCII	`chars-up: charset [#"^(80)" - #"^(FF)"]`
NO SPACE	`chars: complement nochar: charset " ^-^/"`
UTF-2	`utf-2: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/////wAAAAA=}]`
UTF-3	`utf-3: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP//AAA=}]`
UTF-4	`utf-4: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/wA=}]`
UTF-5	`utf-5: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8=}]`
UTF-B	`utf-b: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAP//////////AAAAAAAAAAA=}]`
UTF-8	`utf-8: [utf-2 1 utf-b \| utf-3 2 utf-b \| utf-4 3 utf-b \| utf-5 4 utf-b]`
ODD UNPRINTABLE ASCII CHARACTERS	`bad-chars: complement charset ["^I^J^M" #" " - #"~"]`

I think that could be used with PARSE or other tools.
Remember that Rebol has latin1? and utf? functions to check string encoding.

Wednesday, 7 November 2012