NUMBERS | chars-n: charset [#"0" - #"9"] |
LOWERCASE | chars-la: charset [#"a" - #"z"] |
UPPERCASE | chars-ua: charset [#"A" - #"Z"] |
LETTERS | chars-a: union chars-la chars-ua |
LETTERS and NUMBERS | chars-an: union chars-a chars-n |
HEXADECIMAL VALUES | chars-hx: union chars-n charset [#"A" - #"F" #"a" - #"f"] |
URL DECODE | chars-ud: union chars-an charset "*-._!~'," |
URL | chars-u: union chars-ud charset ":+%&=?" |
ID | chars-id: union chars-n union chars-la charset "-_" |
word-first-letter | chars-w1: union chars-a charset "*-._!+?&|" |
WORD | chars-w*: union chars-w1 chars-n |
FILE | chars-f: insert copy chars-an #"-" |
PATH | chars-p: union chars-an charset "-_!+%" |
SPACE | chars-sp: charset " ^-" |
ABOVE ASCII | chars-up: charset [#"^(80)" - #"^(FF)"] |
NO SPACE | chars: complement nochar: charset " ^-^/" |
UTF-2 | utf-2: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/////wAAAAA=}] |
UTF-3 | utf-3: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP//AAA=}] |
UTF-4 | utf-4: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/wA=}] |
UTF-5 | utf-5: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8=}] |
UTF-B | utf-b: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAP//////////AAAAAAAAAAA=}] |
UTF-8 | utf-8: [utf-2 1 utf-b | utf-3 2 utf-b | utf-4 3 utf-b | utf-5 4 utf-b] |
ODD UNPRINTABLE ASCII CHARACTERS | bad-chars: complement charset ["^I^J^M" #" " - #"~"] |
I think that could be used with PARSE or other tools.
Remember that Rebol has latin1? and utf? functions to check string encoding.
No comments:
Post a Comment