Wednesday, 7 November 2012

Table of useful charset

Here a table of useful charsets:
NUMBERSchars-n: charset [#"0" - #"9"]
LOWERCASEchars-la: charset [#"a" - #"z"]
UPPERCASEchars-ua: charset [#"A" - #"Z"]
LETTERSchars-a: union chars-la chars-ua
LETTERS and NUMBERSchars-an: union chars-a chars-n
HEXADECIMAL VALUESchars-hx: union chars-n charset [#"A" - #"F" #"a" - #"f"]
URL DECODEchars-ud: union chars-an charset "*-._!~',"
URLchars-u: union chars-ud charset ":+%&=?"
IDchars-id: union chars-n union chars-la charset "-_"
word-first-letterchars-w1: union chars-a charset "*-._!+?&|"
WORDchars-w*: union chars-w1 chars-n
FILEchars-f: insert copy chars-an #"-"
PATHchars-p: union chars-an charset "-_!+%"
SPACEchars-sp: charset " ^-"
ABOVE ASCIIchars-up: charset [#"^(80)" - #"^(FF)"]
NO SPACEchars: complement nochar: charset " ^-^/"
UTF-2utf-2: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/////wAAAAA=}]
UTF-3utf-3: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP//AAA=}]
UTF-4utf-4: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/wA=}]
UTF-5utf-5: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8=}]
UTF-Butf-b: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAP//////////AAAAAAAAAAA=}]
UTF-8utf-8: [utf-2 1 utf-b | utf-3 2 utf-b | utf-4 3 utf-b | utf-5 4 utf-b]
ODD UNPRINTABLE ASCII CHARACTERSbad-chars: complement charset ["^I^J^M" #" " - #"~"]

I think that could be used with PARSE or other tools.
Remember that Rebol has latin1? and utf? functions to check string encoding.

No comments:

Post a Comment