| NUMBERS | chars-n: charset [#"0" - #"9"] |
| LOWERCASE | chars-la: charset [#"a" - #"z"] |
| UPPERCASE | chars-ua: charset [#"A" - #"Z"] |
| LETTERS | chars-a: union chars-la chars-ua |
| LETTERS and NUMBERS | chars-an: union chars-a chars-n |
| HEXADECIMAL VALUES | chars-hx: union chars-n charset [#"A" - #"F" #"a" - #"f"] |
| URL DECODE | chars-ud: union chars-an charset "*-._!~'," |
| URL | chars-u: union chars-ud charset ":+%&=?" |
| ID | chars-id: union chars-n union chars-la charset "-_" |
| word-first-letter | chars-w1: union chars-a charset "*-._!+?&|" |
| WORD | chars-w*: union chars-w1 chars-n |
| FILE | chars-f: insert copy chars-an #"-" |
| PATH | chars-p: union chars-an charset "-_!+%" |
| SPACE | chars-sp: charset " ^-" |
| ABOVE ASCII | chars-up: charset [#"^(80)" - #"^(FF)"] |
| NO SPACE | chars: complement nochar: charset " ^-^/" |
| UTF-2 | utf-2: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/////wAAAAA=}] |
| UTF-3 | utf-3: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP//AAA=}] |
| UTF-4 | utf-4: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/wA=}] |
| UTF-5 | utf-5: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8=}] |
| UTF-B | utf-b: #[bitset! 64#{AAAAAAAAAAAAAAAAAAAAAP//////////AAAAAAAAAAA=}] |
| UTF-8 | utf-8: [utf-2 1 utf-b | utf-3 2 utf-b | utf-4 3 utf-b | utf-5 4 utf-b] |
| ODD UNPRINTABLE ASCII CHARACTERS | bad-chars: complement charset ["^I^J^M" #" " - #"~"] |
I think that could be used with PARSE or other tools.
Remember that Rebol has latin1? and utf? functions to check string encoding.
No comments:
Post a Comment