The following codes may help you to write your rules:
Octet: charset [#"^(00)" - #"^(FF)"]
This is very powerful, it means all chars from UTF-8 "0" code to UTF-8 "FF" code. Here the codes that it represents, the last 20 are special UTF-8 chars:
[ ] U+0020 SPACE [!] U+0021 EXCLAMATION MARK ["] U+0022 QUOTATION MARK [#] U+0023 NUMBER SIGN [$] U+0024 DOLLAR SIGN [%] U+0025 PERCENT SIGN [&] U+0026 AMPERSAND ['] U+0027 APOSTROPHE [(] U+0028 LEFT PARENTHESIS [)] U+0029 RIGHT PARENTHESIS [*] U+002A ASTERISK [+] U+002B PLUS SIGN [,] U+002C COMMA [-] U+002D HYPHEN-MINUS [.] U+002E FULL STOP [/] U+002F SOLIDUS [0] U+0030 DIGIT ZERO [1] U+0031 DIGIT ONE [2] U+0032 DIGIT TWO [3] U+0033 DIGIT THREE [4] U+0034 DIGIT FOUR [5] U+0035 DIGIT FIVE [6] U+0036 DIGIT SIX [7] U+0037 DIGIT SEVEN [8] U+0038 DIGIT EIGHT [9] U+0039 DIGIT NINE [:] U+003A COLON [;] U+003B SEMICOLON [<] U+003C LESS-THAN SIGN [=] U+003D EQUALS SIGN [>] U+003E GREATER-THAN SIGN [?] U+003F QUESTION MARK [@] U+0040 COMMERCIAL AT [A] U+0041 LATIN CAPITAL LETTER A [B] U+0042 LATIN CAPITAL LETTER B [C] U+0043 LATIN CAPITAL LETTER C [D] U+0044 LATIN CAPITAL LETTER D [E] U+0045 LATIN CAPITAL LETTER E [F] U+0046 LATIN CAPITAL LETTER F [G] U+0047 LATIN CAPITAL LETTER G [H] U+0048 LATIN CAPITAL LETTER H [I] U+0049 LATIN CAPITAL LETTER I [J] U+004A LATIN CAPITAL LETTER J [K] U+004B LATIN CAPITAL LETTER K [L] U+004C LATIN CAPITAL LETTER L [M] U+004D LATIN CAPITAL LETTER M [N] U+004E LATIN CAPITAL LETTER N [O] U+004F LATIN CAPITAL LETTER O [P] U+0050 LATIN CAPITAL LETTER P [Q] U+0051 LATIN CAPITAL LETTER Q [R] U+0052 LATIN CAPITAL LETTER R [S] U+0053 LATIN CAPITAL LETTER S [T] U+0054 LATIN CAPITAL LETTER T [U] U+0055 LATIN CAPITAL LETTER U [V] U+0056 LATIN CAPITAL LETTER V [W] U+0057 LATIN CAPITAL LETTER W [X] U+0058 LATIN CAPITAL LETTER X [Y] U+0059 LATIN CAPITAL LETTER Y [Z] U+005A LATIN CAPITAL LETTER Z [[] U+005B LEFT SQUARE BRACKET [\] U+005C REVERSE SOLIDUS []] U+005D RIGHT SQUARE BRACKET [^] U+005E CIRCUMFLEX ACCENT [_] U+005F LOW LINE [`] U+0060 GRAVE ACCENT [a] U+0061 LATIN SMALL LETTER A [b] U+0062 LATIN SMALL LETTER B [c] U+0063 LATIN SMALL LETTER C [d] U+0064 LATIN SMALL LETTER D [e] U+0065 LATIN SMALL LETTER E [f] U+0066 LATIN SMALL LETTER F [g] U+0067 LATIN SMALL LETTER G [h] U+0068 LATIN SMALL LETTER H [i] U+0069 LATIN SMALL LETTER I [j] U+006A LATIN SMALL LETTER J [k] U+006B LATIN SMALL LETTER K [l] U+006C LATIN SMALL LETTER L [m] U+006D LATIN SMALL LETTER M [n] U+006E LATIN SMALL LETTER N [o] U+006F LATIN SMALL LETTER O [p] U+0070 LATIN SMALL LETTER P [q] U+0071 LATIN SMALL LETTER Q [r] U+0072 LATIN SMALL LETTER R [s] U+0073 LATIN SMALL LETTER S [t] U+0074 LATIN SMALL LETTER T [u] U+0075 LATIN SMALL LETTER U [v] U+0076 LATIN SMALL LETTER V [w] U+0077 LATIN SMALL LETTER W [x] U+0078 LATIN SMALL LETTER X [y] U+0079 LATIN SMALL LETTER Y [z] U+007A LATIN SMALL LETTER Z [{] U+007B LEFT CURLY BRACKET [|] U+007C VERTICAL LINE [}] U+007D RIGHT CURLY BRACKET [~] U+007E TILDE [ ] U+00A0 NO-BREAK SPACE [¡] U+00A1 INVERTED EXCLAMATION MARK [¢] U+00A2 CENT SIGN [£] U+00A3 POUND SIGN [¤] U+00A4 CURRENCY SIGN [¥] U+00A5 YEN SIGN [¦] U+00A6 BROKEN BAR [§] U+00A7 SECTION SIGN [¨] U+00A8 DIAERESIS [©] U+00A9 COPYRIGHT SIGN [ª] U+00AA FEMININE ORDINAL INDICATOR [«] U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK [¬] U+00AC NOT SIGN [ ] U+00AD SOFT HYPHEN [®] U+00AE REGISTERED SIGN [¯] U+00AF MACRON [°] U+00B0 DEGREE SIGN [±] U+00B1 PLUS-MINUS SIGN [²] U+00B2 SUPERSCRIPT TWO [³] U+00B3 SUPERSCRIPT THREE [´] U+00B4 ACUTE ACCENT [µ] U+00B5 MICRO SIGN [¶] U+00B6 PILCROW SIGN [·] U+00B7 MIDDLE DOT [¸] U+00B8 CEDILLA [¹] U+00B9 SUPERSCRIPT ONE [º] U+00BA MASCULINE ORDINAL INDICATOR [»] U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK [¼] U+00BC VULGAR FRACTION ONE QUARTER [½] U+00BD VULGAR FRACTION ONE HALF [¾] U+00BE VULGAR FRACTION THREE QUARTERS [¿] U+00BF INVERTED QUESTION MARK [À] U+00C0 LATIN CAPITAL LETTER A WITH GRAVE [Á] U+00C1 LATIN CAPITAL LETTER A WITH ACUTE [Â] U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX [Ã] U+00C3 LATIN CAPITAL LETTER A WITH TILDE [Ä] U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS [Å] U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE [Æ] U+00C6 LATIN CAPITAL LETTER AE [Ç] U+00C7 LATIN CAPITAL LETTER C WITH CEDILLA [È] U+00C8 LATIN CAPITAL LETTER E WITH GRAVE [É] U+00C9 LATIN CAPITAL LETTER E WITH ACUTE [Ê] U+00CA LATIN CAPITAL LETTER E WITH CIRCUMFLEX [Ë] U+00CB LATIN CAPITAL LETTER E WITH DIAERESIS [Ì] U+00CC LATIN CAPITAL LETTER I WITH GRAVE [Í] U+00CD LATIN CAPITAL LETTER I WITH ACUTE [Î] U+00CE LATIN CAPITAL LETTER I WITH CIRCUMFLEX [Ï] U+00CF LATIN CAPITAL LETTER I WITH DIAERESIS [Ð] U+00D0 LATIN CAPITAL LETTER ETH [Ñ] U+00D1 LATIN CAPITAL LETTER N WITH TILDE [Ò] U+00D2 LATIN CAPITAL LETTER O WITH GRAVE [Ó] U+00D3 LATIN CAPITAL LETTER O WITH ACUTE [Ô] U+00D4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX [Õ] U+00D5 LATIN CAPITAL LETTER O WITH TILDE [Ö] U+00D6 LATIN CAPITAL LETTER O WITH DIAERESIS [×] U+00D7 MULTIPLICATION SIGN [Ø] U+00D8 LATIN CAPITAL LETTER O WITH STROKE [Ù] U+00D9 LATIN CAPITAL LETTER U WITH GRAVE [Ú] U+00DA LATIN CAPITAL LETTER U WITH ACUTE [Û] U+00DB LATIN CAPITAL LETTER U WITH CIRCUMFLEX [Ü] U+00DC LATIN CAPITAL LETTER U WITH DIAERESIS [Ý] U+00DD LATIN CAPITAL LETTER Y WITH ACUTE [Þ] U+00DE LATIN CAPITAL LETTER THORN [ß] U+00DF LATIN SMALL LETTER SHARP S [à] U+00E0 LATIN SMALL LETTER A WITH GRAVE [á] U+00E1 LATIN SMALL LETTER A WITH ACUTE [â] U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX [ã] U+00E3 LATIN SMALL LETTER A WITH TILDE [ä] U+00E4 LATIN SMALL LETTER A WITH DIAERESIS [å] U+00E5 LATIN SMALL LETTER A WITH RING ABOVE [æ] U+00E6 LATIN SMALL LETTER AE [ç] U+00E7 LATIN SMALL LETTER C WITH CEDILLA [è] U+00E8 LATIN SMALL LETTER E WITH GRAVE [é] U+00E9 LATIN SMALL LETTER E WITH ACUTE [ê] U+00EA LATIN SMALL LETTER E WITH CIRCUMFLEX [ë] U+00EB LATIN SMALL LETTER E WITH DIAERESIS [ì] U+00EC LATIN SMALL LETTER I WITH GRAVE [í] U+00ED LATIN SMALL LETTER I WITH ACUTE [î] U+00EE LATIN SMALL LETTER I WITH CIRCUMFLEX [ï] U+00EF LATIN SMALL LETTER I WITH DIAERESIS [ð] U+00F0 LATIN SMALL LETTER ETH [ñ] U+00F1 LATIN SMALL LETTER N WITH TILDE [ò] U+00F2 LATIN SMALL LETTER O WITH GRAVE [ó] U+00F3 LATIN SMALL LETTER O WITH ACUTE [ô] U+00F4 LATIN SMALL LETTER O WITH CIRCUMFLEX [õ] U+00F5 LATIN SMALL LETTER O WITH TILDE [ö] U+00F6 LATIN SMALL LETTER O WITH DIAERESIS [÷] U+00F7 DIVISION SIGN [ø] U+00F8 LATIN SMALL LETTER O WITH STROKE [ù] U+00F9 LATIN SMALL LETTER U WITH GRAVE [ú] U+00FA LATIN SMALL LETTER U WITH ACUTE [û] U+00FB LATIN SMALL LETTER U WITH CIRCUMFLEX [ü] U+00FC LATIN SMALL LETTER U WITH DIAERESIS [ý] U+00FD LATIN SMALL LETTER Y WITH ACUTE [þ] U+00FE LATIN SMALL LETTER THORN
U+0000 NULL U+0001 START OF HEADING U+0002 START OF TEXT U+0003 END OF TEXT U+0004 END OF TRANSMISSION U+0005 ENQUIRY U+0006 ACKNOWLEDGE U+0007 BELL U+0008 BACKSPACE U+0009 Cc;0;S ;N;CHARACTER TABULATION U+000A Cc;0;B ;N;LINE FEED (LF) U+000B Cc;0;S ;N;LINE TABULATION U+000C Cc;0;WS ;N;FORM FEED (FF) U+000D Cc;0;B ;N;CARRIAGE RETURN (CR) U+000E SHIFT OUT U+000F SHIFT IN U+0010 DATA LINK ESCAPE U+0011 DEVICE CONTROL ONE U+0012 DEVICE CONTROL TWO U+0013 DEVICE CONTROL THREE U+0014 DEVICE CONTROL FOUR U+0015 NEGATIVE ACKNOWLEDGE U+0016 SYNCHRONOUS IDLE U+0017 END OF TRANSMISSION BLOCK U+0018 CANCEL U+0019 END OF MEDIUM U+001A SUBSTITUTE U+001B ESCAPE U+001C INFORMATION SEPARATOR FOUR U+001D INFORMATION SEPARATOR THREE U+001E INFORMATION SEPARATOR TWO U+001F INFORMATION SEPARATOR ONEThe following represents all chars (uppercase and lowercase):
Char: charset [#"^(00)" - #"^(7F)"]
The following all digits:
Digit: charset "0123456789"
The following check if there some number combination:
>> Digits: [some Digit]
== [some Digit]
>> parse "552" digits
== true
Uppercase chars:
Upper: charset [#"A" - #"Z"]
Lowercase chars:
Lower: charset [#"a" - #"z"]
Another way to make all chars:
Alpha: union Upper Lower
All chars and digits:
>> AlphaDigit: union Alpha Digit
>> parse "Hello 123" [some alphadigit]
== true
Control chars:
Control: charset [#"^(00)" - #"^(1F)" #"^(7F)"]
Hexadecimal values:
>> Hex: union Digit charset [#"A" - #"F" #"a" - #"f"]
>> parse "1a2" [some hex]
== true
The TAB:
>> tab
== #"^-"
>> HT: #"^-"
Linear white space (LWS), a combination of space and tab:
>> SP: #" "
>> LWS: charset reduce [SP HT]
New line and carriage return, white spaces:
>> newline
== #"^/"
>> LF
== #"^/"
>> cr
== #"^M"
>> WS: charset reduce [SP HT newline CR LF]
Punctuation:
Graphic: charset [#"^(21)" - #"^(7E)"]
OMG I needed this so bad a few years ago... Better late than never. Thanks.
ReplyDelete