Thursday, 22 March 2012

Parse rules

Parse is the Rebol command to split a text or manipulate a text following your rules. (see http://www.rebol.com/docs/core23/rebolcore-15.html)
The following codes may help you to write your rules:

Octet: charset [#"^(00)" - #"^(FF)"]

This is very powerful, it means all chars from UTF-8 "0" code to UTF-8 "FF" code. Here the codes that it represents, the last 20 are special UTF-8 chars:
[ ]  U+0020        SPACE
[!]  U+0021        EXCLAMATION MARK
["]  U+0022        QUOTATION MARK
[#]  U+0023        NUMBER SIGN
[$]  U+0024        DOLLAR SIGN
[%]  U+0025        PERCENT SIGN
[&]  U+0026        AMPERSAND
[']  U+0027        APOSTROPHE
[(]  U+0028        LEFT PARENTHESIS
[)]  U+0029        RIGHT PARENTHESIS
[*]  U+002A        ASTERISK
[+]  U+002B        PLUS SIGN
[,]  U+002C        COMMA
[-]  U+002D        HYPHEN-MINUS
[.]  U+002E        FULL STOP
[/]  U+002F        SOLIDUS
[0]  U+0030        DIGIT ZERO
[1]  U+0031        DIGIT ONE
[2]  U+0032        DIGIT TWO
[3]  U+0033        DIGIT THREE
[4]  U+0034        DIGIT FOUR
[5]  U+0035        DIGIT FIVE
[6]  U+0036        DIGIT SIX
[7]  U+0037        DIGIT SEVEN
[8]  U+0038        DIGIT EIGHT
[9]  U+0039        DIGIT NINE
[:]  U+003A        COLON
[;]  U+003B        SEMICOLON
[<]  U+003C        LESS-THAN SIGN
[=]  U+003D        EQUALS SIGN
[>]  U+003E        GREATER-THAN SIGN
[?]  U+003F        QUESTION MARK
[@]  U+0040        COMMERCIAL AT
[A]  U+0041        LATIN CAPITAL LETTER A
[B]  U+0042        LATIN CAPITAL LETTER B
[C]  U+0043        LATIN CAPITAL LETTER C
[D]  U+0044        LATIN CAPITAL LETTER D
[E]  U+0045        LATIN CAPITAL LETTER E
[F]  U+0046        LATIN CAPITAL LETTER F
[G]  U+0047        LATIN CAPITAL LETTER G
[H]  U+0048        LATIN CAPITAL LETTER H
[I]  U+0049        LATIN CAPITAL LETTER I
[J]  U+004A        LATIN CAPITAL LETTER J
[K]  U+004B        LATIN CAPITAL LETTER K
[L]  U+004C        LATIN CAPITAL LETTER L
[M]  U+004D        LATIN CAPITAL LETTER M
[N]  U+004E        LATIN CAPITAL LETTER N
[O]  U+004F        LATIN CAPITAL LETTER O
[P]  U+0050        LATIN CAPITAL LETTER P
[Q]  U+0051        LATIN CAPITAL LETTER Q
[R]  U+0052        LATIN CAPITAL LETTER R
[S]  U+0053        LATIN CAPITAL LETTER S
[T]  U+0054        LATIN CAPITAL LETTER T
[U]  U+0055        LATIN CAPITAL LETTER U
[V]  U+0056        LATIN CAPITAL LETTER V
[W]  U+0057        LATIN CAPITAL LETTER W
[X]  U+0058        LATIN CAPITAL LETTER X
[Y]  U+0059        LATIN CAPITAL LETTER Y
[Z]  U+005A        LATIN CAPITAL LETTER Z
[[]  U+005B        LEFT SQUARE BRACKET
[\]  U+005C        REVERSE SOLIDUS
[]]  U+005D        RIGHT SQUARE BRACKET
[^]  U+005E        CIRCUMFLEX ACCENT
[_]  U+005F        LOW LINE
[`]  U+0060        GRAVE ACCENT
[a]  U+0061        LATIN SMALL LETTER A
[b]  U+0062        LATIN SMALL LETTER B
[c]  U+0063        LATIN SMALL LETTER C
[d]  U+0064       LATIN SMALL LETTER D
[e]  U+0065       LATIN SMALL LETTER E
[f]  U+0066       LATIN SMALL LETTER F
[g]  U+0067       LATIN SMALL LETTER G
[h]  U+0068       LATIN SMALL LETTER H
[i]  U+0069       LATIN SMALL LETTER I
[j]  U+006A       LATIN SMALL LETTER J
[k]  U+006B       LATIN SMALL LETTER K
[l]  U+006C       LATIN SMALL LETTER L
[m]  U+006D       LATIN SMALL LETTER M
[n]  U+006E       LATIN SMALL LETTER N
[o]  U+006F       LATIN SMALL LETTER O
[p]  U+0070       LATIN SMALL LETTER P
[q]  U+0071       LATIN SMALL LETTER Q
[r]  U+0072       LATIN SMALL LETTER R
[s]  U+0073       LATIN SMALL LETTER S
[t]  U+0074       LATIN SMALL LETTER T
[u]  U+0075       LATIN SMALL LETTER U
[v]  U+0076       LATIN SMALL LETTER V
[w]  U+0077       LATIN SMALL LETTER W
[x]  U+0078       LATIN SMALL LETTER X
[y]  U+0079       LATIN SMALL LETTER Y
[z]  U+007A       LATIN SMALL LETTER Z
[{]  U+007B       LEFT CURLY BRACKET
[|]  U+007C       VERTICAL LINE
[}]  U+007D       RIGHT CURLY BRACKET
[~]  U+007E       TILDE
[ ]  U+00A0       NO-BREAK SPACE
[¡]  U+00A1       INVERTED EXCLAMATION MARK
[¢]  U+00A2       CENT SIGN
[£]  U+00A3       POUND SIGN
[¤]  U+00A4       CURRENCY SIGN
[¥]  U+00A5       YEN SIGN
[¦]  U+00A6       BROKEN BAR
[§]  U+00A7       SECTION SIGN
[¨]  U+00A8       DIAERESIS
[©]  U+00A9       COPYRIGHT SIGN
[ª]  U+00AA       FEMININE ORDINAL INDICATOR
[«]  U+00AB       LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
[¬]  U+00AC       NOT SIGN
[ ]  U+00AD       SOFT HYPHEN
[®]  U+00AE       REGISTERED SIGN
[¯]  U+00AF       MACRON
[°]  U+00B0       DEGREE SIGN
[±]  U+00B1       PLUS-MINUS SIGN
[²]  U+00B2       SUPERSCRIPT TWO
[³]  U+00B3       SUPERSCRIPT THREE
[´]  U+00B4       ACUTE ACCENT
[µ]  U+00B5       MICRO SIGN
[¶]  U+00B6       PILCROW SIGN
[·]  U+00B7       MIDDLE DOT
[¸]  U+00B8       CEDILLA
[¹]  U+00B9       SUPERSCRIPT ONE
[º]  U+00BA       MASCULINE ORDINAL INDICATOR
[»]  U+00BB       RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
[¼]  U+00BC       VULGAR FRACTION ONE QUARTER
[½]  U+00BD       VULGAR FRACTION ONE HALF
[¾]  U+00BE       VULGAR FRACTION THREE QUARTERS
[¿]  U+00BF       INVERTED QUESTION MARK
[À]  U+00C0       LATIN CAPITAL LETTER A WITH GRAVE
[Á]  U+00C1       LATIN CAPITAL LETTER A WITH ACUTE
[Â]  U+00C2       LATIN CAPITAL LETTER A WITH CIRCUMFLEX
[Ã]  U+00C3       LATIN CAPITAL LETTER A WITH TILDE
[Ä]  U+00C4       LATIN CAPITAL LETTER A WITH DIAERESIS
[Å]  U+00C5       LATIN CAPITAL LETTER A WITH RING ABOVE
[Æ]  U+00C6       LATIN CAPITAL LETTER AE
[Ç]  U+00C7       LATIN CAPITAL LETTER C WITH CEDILLA
[È]  U+00C8       LATIN CAPITAL LETTER E WITH GRAVE
[É]  U+00C9       LATIN CAPITAL LETTER E WITH ACUTE
[Ê]  U+00CA       LATIN CAPITAL LETTER E WITH CIRCUMFLEX
[Ë]  U+00CB       LATIN CAPITAL LETTER E WITH DIAERESIS
[Ì]  U+00CC       LATIN CAPITAL LETTER I WITH GRAVE
[Í]  U+00CD       LATIN CAPITAL LETTER I WITH ACUTE
[Î]  U+00CE       LATIN CAPITAL LETTER I WITH CIRCUMFLEX
[Ï]  U+00CF       LATIN CAPITAL LETTER I WITH DIAERESIS
[Ð]  U+00D0       LATIN CAPITAL LETTER ETH
[Ñ]  U+00D1       LATIN CAPITAL LETTER N WITH TILDE
[Ò]  U+00D2       LATIN CAPITAL LETTER O WITH GRAVE
[Ó]  U+00D3       LATIN CAPITAL LETTER O WITH ACUTE
[Ô]  U+00D4       LATIN CAPITAL LETTER O WITH CIRCUMFLEX
[Õ]  U+00D5       LATIN CAPITAL LETTER O WITH TILDE
[Ö]  U+00D6       LATIN CAPITAL LETTER O WITH DIAERESIS
[×]  U+00D7       MULTIPLICATION SIGN
[Ø]  U+00D8       LATIN CAPITAL LETTER O WITH STROKE
[Ù]  U+00D9       LATIN CAPITAL LETTER U WITH GRAVE
[Ú]  U+00DA       LATIN CAPITAL LETTER U WITH ACUTE
[Û]  U+00DB       LATIN CAPITAL LETTER U WITH CIRCUMFLEX
[Ü]  U+00DC       LATIN CAPITAL LETTER U WITH DIAERESIS
[Ý]  U+00DD       LATIN CAPITAL LETTER Y WITH ACUTE
[Þ]  U+00DE       LATIN CAPITAL LETTER THORN
[ß]  U+00DF       LATIN SMALL LETTER SHARP S
[à]  U+00E0       LATIN SMALL LETTER A WITH GRAVE
[á]  U+00E1       LATIN SMALL LETTER A WITH ACUTE
[â]  U+00E2       LATIN SMALL LETTER A WITH CIRCUMFLEX
[ã]  U+00E3       LATIN SMALL LETTER A WITH TILDE
[ä]  U+00E4       LATIN SMALL LETTER A WITH DIAERESIS
[å]  U+00E5       LATIN SMALL LETTER A WITH RING ABOVE
[æ]  U+00E6       LATIN SMALL LETTER AE
[ç]  U+00E7       LATIN SMALL LETTER C WITH CEDILLA
[è]  U+00E8       LATIN SMALL LETTER E WITH GRAVE
[é]  U+00E9       LATIN SMALL LETTER E WITH ACUTE
[ê]  U+00EA       LATIN SMALL LETTER E WITH CIRCUMFLEX
[ë]  U+00EB       LATIN SMALL LETTER E WITH DIAERESIS
[ì]  U+00EC       LATIN SMALL LETTER I WITH GRAVE
[í]  U+00ED       LATIN SMALL LETTER I WITH ACUTE
[î]  U+00EE       LATIN SMALL LETTER I WITH CIRCUMFLEX
[ï]  U+00EF       LATIN SMALL LETTER I WITH DIAERESIS
[ð]  U+00F0       LATIN SMALL LETTER ETH
[ñ]  U+00F1       LATIN SMALL LETTER N WITH TILDE
[ò]  U+00F2       LATIN SMALL LETTER O WITH GRAVE
[ó]  U+00F3       LATIN SMALL LETTER O WITH ACUTE
[ô]  U+00F4       LATIN SMALL LETTER O WITH CIRCUMFLEX
[õ]  U+00F5       LATIN SMALL LETTER O WITH TILDE
[ö]  U+00F6       LATIN SMALL LETTER O WITH DIAERESIS
[÷]  U+00F7       DIVISION SIGN
[ø]  U+00F8       LATIN SMALL LETTER O WITH STROKE
[ù]  U+00F9       LATIN SMALL LETTER U WITH GRAVE
[ú]  U+00FA       LATIN SMALL LETTER U WITH ACUTE
[û]  U+00FB       LATIN SMALL LETTER U WITH CIRCUMFLEX
[ü]  U+00FC       LATIN SMALL LETTER U WITH DIAERESIS
[ý]  U+00FD       LATIN SMALL LETTER Y WITH ACUTE
[þ]  U+00FE       LATIN SMALL LETTER THORN
U+0000  NULL 
U+0001  START OF HEADING 
U+0002  START OF TEXT 
U+0003  END OF TEXT 
U+0004  END OF TRANSMISSION 
U+0005  ENQUIRY 
U+0006  ACKNOWLEDGE 
U+0007  BELL 
U+0008  BACKSPACE 
U+0009 Cc;0;S ;N;CHARACTER TABULATION 
U+000A Cc;0;B ;N;LINE FEED (LF) 
U+000B Cc;0;S ;N;LINE TABULATION 
U+000C Cc;0;WS ;N;FORM FEED (FF) 
U+000D Cc;0;B ;N;CARRIAGE RETURN (CR) 
U+000E  SHIFT OUT 
U+000F  SHIFT IN 
U+0010  DATA LINK ESCAPE 
U+0011  DEVICE CONTROL ONE 
U+0012  DEVICE CONTROL TWO 
U+0013  DEVICE CONTROL THREE 
U+0014  DEVICE CONTROL FOUR 
U+0015  NEGATIVE ACKNOWLEDGE 
U+0016  SYNCHRONOUS IDLE 
U+0017  END OF TRANSMISSION BLOCK 
U+0018  CANCEL 
U+0019  END OF MEDIUM 
U+001A  SUBSTITUTE 
U+001B  ESCAPE 
U+001C INFORMATION SEPARATOR FOUR 
U+001D INFORMATION SEPARATOR THREE 
U+001E INFORMATION SEPARATOR TWO 
U+001F INFORMATION SEPARATOR ONE 
The following represents all chars (uppercase and lowercase):

Char: charset [#"^(00)" - #"^(7F)"]

The following all digits:

Digit: charset "0123456789"

The following check if there some number combination:

>> Digits: [some Digit]
== [some Digit]
>> parse "552" digits
== true

Uppercase chars:

Upper: charset [#"A" - #"Z"]

Lowercase chars:

Lower: charset [#"a" - #"z"]

Another way to make all chars:

Alpha: union Upper Lower

All chars and digits:

>> AlphaDigit: union Alpha Digit
>> parse "Hello 123" [some alphadigit]
== true

Control chars:

Control: charset [#"^(00)" - #"^(1F)" #"^(7F)"]

Hexadecimal values:

>> Hex: union Digit charset [#"A" - #"F" #"a" - #"f"]
>> parse "1a2" [some hex]
== true

The TAB:

>> tab
== #"^-"
>> HT: #"^-"

Linear white space (LWS), a combination of space and tab:

>> SP: #" "
>> LWS: charset reduce [SP HT]

New line and carriage return, white spaces:

>> newline
== #"^/"
>> LF
== #"^/"
>> cr
== #"^M"
>> WS: charset reduce [SP HT newline CR LF]

Punctuation:

Graphic: charset [#"^(21)" - #"^(7E)"]

1 comment:

  1. OMG I needed this so bad a few years ago... Better late than never. Thanks.

    ReplyDelete