The following codes may help you to write your rules:
Octet: charset [#"^(00)" - #"^(FF)"]
This is very powerful, it means all chars from UTF-8 "0" code to UTF-8 "FF" code. Here the codes that it represents, the last 20 are special UTF-8 chars:
[ ] U+0020 SPACE
[!] U+0021 EXCLAMATION MARK
["] U+0022 QUOTATION MARK
[#] U+0023 NUMBER SIGN
[$] U+0024 DOLLAR SIGN
[%] U+0025 PERCENT SIGN
[&] U+0026 AMPERSAND
['] U+0027 APOSTROPHE
[(] U+0028 LEFT PARENTHESIS
[)] U+0029 RIGHT PARENTHESIS
[*] U+002A ASTERISK
[+] U+002B PLUS SIGN
[,] U+002C COMMA
[-] U+002D HYPHEN-MINUS
[.] U+002E FULL STOP
[/] U+002F SOLIDUS
[0] U+0030 DIGIT ZERO
[1] U+0031 DIGIT ONE
[2] U+0032 DIGIT TWO
[3] U+0033 DIGIT THREE
[4] U+0034 DIGIT FOUR
[5] U+0035 DIGIT FIVE
[6] U+0036 DIGIT SIX
[7] U+0037 DIGIT SEVEN
[8] U+0038 DIGIT EIGHT
[9] U+0039 DIGIT NINE
[:] U+003A COLON
[;] U+003B SEMICOLON
[<] U+003C LESS-THAN SIGN
[=] U+003D EQUALS SIGN
[>] U+003E GREATER-THAN SIGN
[?] U+003F QUESTION MARK
[@] U+0040 COMMERCIAL AT
[A] U+0041 LATIN CAPITAL LETTER A
[B] U+0042 LATIN CAPITAL LETTER B
[C] U+0043 LATIN CAPITAL LETTER C
[D] U+0044 LATIN CAPITAL LETTER D
[E] U+0045 LATIN CAPITAL LETTER E
[F] U+0046 LATIN CAPITAL LETTER F
[G] U+0047 LATIN CAPITAL LETTER G
[H] U+0048 LATIN CAPITAL LETTER H
[I] U+0049 LATIN CAPITAL LETTER I
[J] U+004A LATIN CAPITAL LETTER J
[K] U+004B LATIN CAPITAL LETTER K
[L] U+004C LATIN CAPITAL LETTER L
[M] U+004D LATIN CAPITAL LETTER M
[N] U+004E LATIN CAPITAL LETTER N
[O] U+004F LATIN CAPITAL LETTER O
[P] U+0050 LATIN CAPITAL LETTER P
[Q] U+0051 LATIN CAPITAL LETTER Q
[R] U+0052 LATIN CAPITAL LETTER R
[S] U+0053 LATIN CAPITAL LETTER S
[T] U+0054 LATIN CAPITAL LETTER T
[U] U+0055 LATIN CAPITAL LETTER U
[V] U+0056 LATIN CAPITAL LETTER V
[W] U+0057 LATIN CAPITAL LETTER W
[X] U+0058 LATIN CAPITAL LETTER X
[Y] U+0059 LATIN CAPITAL LETTER Y
[Z] U+005A LATIN CAPITAL LETTER Z
[[] U+005B LEFT SQUARE BRACKET
[\] U+005C REVERSE SOLIDUS
[]] U+005D RIGHT SQUARE BRACKET
[^] U+005E CIRCUMFLEX ACCENT
[_] U+005F LOW LINE
[`] U+0060 GRAVE ACCENT
[a] U+0061 LATIN SMALL LETTER A
[b] U+0062 LATIN SMALL LETTER B
[c] U+0063 LATIN SMALL LETTER C
[d] U+0064 LATIN SMALL LETTER D
[e] U+0065 LATIN SMALL LETTER E
[f] U+0066 LATIN SMALL LETTER F
[g] U+0067 LATIN SMALL LETTER G
[h] U+0068 LATIN SMALL LETTER H
[i] U+0069 LATIN SMALL LETTER I
[j] U+006A LATIN SMALL LETTER J
[k] U+006B LATIN SMALL LETTER K
[l] U+006C LATIN SMALL LETTER L
[m] U+006D LATIN SMALL LETTER M
[n] U+006E LATIN SMALL LETTER N
[o] U+006F LATIN SMALL LETTER O
[p] U+0070 LATIN SMALL LETTER P
[q] U+0071 LATIN SMALL LETTER Q
[r] U+0072 LATIN SMALL LETTER R
[s] U+0073 LATIN SMALL LETTER S
[t] U+0074 LATIN SMALL LETTER T
[u] U+0075 LATIN SMALL LETTER U
[v] U+0076 LATIN SMALL LETTER V
[w] U+0077 LATIN SMALL LETTER W
[x] U+0078 LATIN SMALL LETTER X
[y] U+0079 LATIN SMALL LETTER Y
[z] U+007A LATIN SMALL LETTER Z
[{] U+007B LEFT CURLY BRACKET
[|] U+007C VERTICAL LINE
[}] U+007D RIGHT CURLY BRACKET
[~] U+007E TILDE
[ ] U+00A0 NO-BREAK SPACE
[¡] U+00A1 INVERTED EXCLAMATION MARK
[¢] U+00A2 CENT SIGN
[£] U+00A3 POUND SIGN
[¤] U+00A4 CURRENCY SIGN
[¥] U+00A5 YEN SIGN
[¦] U+00A6 BROKEN BAR
[§] U+00A7 SECTION SIGN
[¨] U+00A8 DIAERESIS
[©] U+00A9 COPYRIGHT SIGN
[ª] U+00AA FEMININE ORDINAL INDICATOR
[«] U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
[¬] U+00AC NOT SIGN
[ ] U+00AD SOFT HYPHEN
[®] U+00AE REGISTERED SIGN
[¯] U+00AF MACRON
[°] U+00B0 DEGREE SIGN
[±] U+00B1 PLUS-MINUS SIGN
[²] U+00B2 SUPERSCRIPT TWO
[³] U+00B3 SUPERSCRIPT THREE
[´] U+00B4 ACUTE ACCENT
[µ] U+00B5 MICRO SIGN
[¶] U+00B6 PILCROW SIGN
[·] U+00B7 MIDDLE DOT
[¸] U+00B8 CEDILLA
[¹] U+00B9 SUPERSCRIPT ONE
[º] U+00BA MASCULINE ORDINAL INDICATOR
[»] U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
[¼] U+00BC VULGAR FRACTION ONE QUARTER
[½] U+00BD VULGAR FRACTION ONE HALF
[¾] U+00BE VULGAR FRACTION THREE QUARTERS
[¿] U+00BF INVERTED QUESTION MARK
[À] U+00C0 LATIN CAPITAL LETTER A WITH GRAVE
[Á] U+00C1 LATIN CAPITAL LETTER A WITH ACUTE
[Â] U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
[Ã] U+00C3 LATIN CAPITAL LETTER A WITH TILDE
[Ä] U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
[Å] U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
[Æ] U+00C6 LATIN CAPITAL LETTER AE
[Ç] U+00C7 LATIN CAPITAL LETTER C WITH CEDILLA
[È] U+00C8 LATIN CAPITAL LETTER E WITH GRAVE
[É] U+00C9 LATIN CAPITAL LETTER E WITH ACUTE
[Ê] U+00CA LATIN CAPITAL LETTER E WITH CIRCUMFLEX
[Ë] U+00CB LATIN CAPITAL LETTER E WITH DIAERESIS
[Ì] U+00CC LATIN CAPITAL LETTER I WITH GRAVE
[Í] U+00CD LATIN CAPITAL LETTER I WITH ACUTE
[Î] U+00CE LATIN CAPITAL LETTER I WITH CIRCUMFLEX
[Ï] U+00CF LATIN CAPITAL LETTER I WITH DIAERESIS
[Ð] U+00D0 LATIN CAPITAL LETTER ETH
[Ñ] U+00D1 LATIN CAPITAL LETTER N WITH TILDE
[Ò] U+00D2 LATIN CAPITAL LETTER O WITH GRAVE
[Ó] U+00D3 LATIN CAPITAL LETTER O WITH ACUTE
[Ô] U+00D4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX
[Õ] U+00D5 LATIN CAPITAL LETTER O WITH TILDE
[Ö] U+00D6 LATIN CAPITAL LETTER O WITH DIAERESIS
[×] U+00D7 MULTIPLICATION SIGN
[Ø] U+00D8 LATIN CAPITAL LETTER O WITH STROKE
[Ù] U+00D9 LATIN CAPITAL LETTER U WITH GRAVE
[Ú] U+00DA LATIN CAPITAL LETTER U WITH ACUTE
[Û] U+00DB LATIN CAPITAL LETTER U WITH CIRCUMFLEX
[Ü] U+00DC LATIN CAPITAL LETTER U WITH DIAERESIS
[Ý] U+00DD LATIN CAPITAL LETTER Y WITH ACUTE
[Þ] U+00DE LATIN CAPITAL LETTER THORN
[ß] U+00DF LATIN SMALL LETTER SHARP S
[à] U+00E0 LATIN SMALL LETTER A WITH GRAVE
[á] U+00E1 LATIN SMALL LETTER A WITH ACUTE
[â] U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX
[ã] U+00E3 LATIN SMALL LETTER A WITH TILDE
[ä] U+00E4 LATIN SMALL LETTER A WITH DIAERESIS
[å] U+00E5 LATIN SMALL LETTER A WITH RING ABOVE
[æ] U+00E6 LATIN SMALL LETTER AE
[ç] U+00E7 LATIN SMALL LETTER C WITH CEDILLA
[è] U+00E8 LATIN SMALL LETTER E WITH GRAVE
[é] U+00E9 LATIN SMALL LETTER E WITH ACUTE
[ê] U+00EA LATIN SMALL LETTER E WITH CIRCUMFLEX
[ë] U+00EB LATIN SMALL LETTER E WITH DIAERESIS
[ì] U+00EC LATIN SMALL LETTER I WITH GRAVE
[í] U+00ED LATIN SMALL LETTER I WITH ACUTE
[î] U+00EE LATIN SMALL LETTER I WITH CIRCUMFLEX
[ï] U+00EF LATIN SMALL LETTER I WITH DIAERESIS
[ð] U+00F0 LATIN SMALL LETTER ETH
[ñ] U+00F1 LATIN SMALL LETTER N WITH TILDE
[ò] U+00F2 LATIN SMALL LETTER O WITH GRAVE
[ó] U+00F3 LATIN SMALL LETTER O WITH ACUTE
[ô] U+00F4 LATIN SMALL LETTER O WITH CIRCUMFLEX
[õ] U+00F5 LATIN SMALL LETTER O WITH TILDE
[ö] U+00F6 LATIN SMALL LETTER O WITH DIAERESIS
[÷] U+00F7 DIVISION SIGN
[ø] U+00F8 LATIN SMALL LETTER O WITH STROKE
[ù] U+00F9 LATIN SMALL LETTER U WITH GRAVE
[ú] U+00FA LATIN SMALL LETTER U WITH ACUTE
[û] U+00FB LATIN SMALL LETTER U WITH CIRCUMFLEX
[ü] U+00FC LATIN SMALL LETTER U WITH DIAERESIS
[ý] U+00FD LATIN SMALL LETTER Y WITH ACUTE
[þ] U+00FE LATIN SMALL LETTER THORNU+0000 NULL U+0001 START OF HEADING U+0002 START OF TEXT U+0003 END OF TEXT U+0004 END OF TRANSMISSION U+0005 ENQUIRY U+0006 ACKNOWLEDGE U+0007 BELL U+0008 BACKSPACE U+0009 Cc;0;S ;N;CHARACTER TABULATION U+000A Cc;0;B ;N;LINE FEED (LF) U+000B Cc;0;S ;N;LINE TABULATION U+000C Cc;0;WS ;N;FORM FEED (FF) U+000D Cc;0;B ;N;CARRIAGE RETURN (CR) U+000E SHIFT OUT U+000F SHIFT IN U+0010 DATA LINK ESCAPE U+0011 DEVICE CONTROL ONE U+0012 DEVICE CONTROL TWO U+0013 DEVICE CONTROL THREE U+0014 DEVICE CONTROL FOUR U+0015 NEGATIVE ACKNOWLEDGE U+0016 SYNCHRONOUS IDLE U+0017 END OF TRANSMISSION BLOCK U+0018 CANCEL U+0019 END OF MEDIUM U+001A SUBSTITUTE U+001B ESCAPE U+001C INFORMATION SEPARATOR FOUR U+001D INFORMATION SEPARATOR THREE U+001E INFORMATION SEPARATOR TWO U+001F INFORMATION SEPARATOR ONEThe following represents all chars (uppercase and lowercase):
Char: charset [#"^(00)" - #"^(7F)"]
The following all digits:
Digit: charset "0123456789"
The following check if there some number combination:
>> Digits: [some Digit]
== [some Digit]
>> parse "552" digits
== true
Uppercase chars:
Upper: charset [#"A" - #"Z"]
Lowercase chars:
Lower: charset [#"a" - #"z"]
Another way to make all chars:
Alpha: union Upper Lower
All chars and digits:
>> AlphaDigit: union Alpha Digit
>> parse "Hello 123" [some alphadigit]
== true
Control chars:
Control: charset [#"^(00)" - #"^(1F)" #"^(7F)"]
Hexadecimal values:
>> Hex: union Digit charset [#"A" - #"F" #"a" - #"f"]
>> parse "1a2" [some hex]
== true
The TAB:
>> tab
== #"^-"
>> HT: #"^-"
Linear white space (LWS), a combination of space and tab:
>> SP: #" "
>> LWS: charset reduce [SP HT]
New line and carriage return, white spaces:
>> newline
== #"^/"
>> LF
== #"^/"
>> cr
== #"^M"
>> WS: charset reduce [SP HT newline CR LF]
Punctuation:
Graphic: charset [#"^(21)" - #"^(7E)"]
OMG I needed this so bad a few years ago... Better late than never. Thanks.
ReplyDelete