Friday, 18 November 2011

Url shortening

A lot of sites today give the free opportunity to shorten web site urls, for example:
http://www.amazon.com/Kindle-Wireless-Reading-Display-Globa lly/dp/B003FSUDM4/ref=amb_link_353259562_2?pf_rd_m=ATVPDKIK X0DER&pf_rd_s=center-10&pf_rd_r=11EYKTN682A79T370AM3&pf_rd_ t=201&pf_rd_p=1270985982&pf_rd_i=B002Y27P3M

turn this way:
http://tinyurl.com/KindleWireless

How does it works?
Usually the url is recorded in a database and is assigned a number (132), then the number is converted in base58  (i.e using the following "human" characters   "123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ").
It's possible to obtain this with Rebol with very few lines (thank to Ross-Gill for the inspiration):

to-base58: use [ch out][
ch: "123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"

func [id [number! issue!]][
id: load form id
out: copy ""
while [id > 0][
insert out ch/(round id // 58 + 1)
id: round/floor/to (id / 58) 1E0
]
out
]
]

load-base58: func [id [string! issue!]] [
ch: "123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"
out: 0
temp: length? id
ad: copy []
foreach dg id [insert ad ( -1 + index? find/case ch dg) ]
pow: 0
foreach item ad [
out: out + ( ( 58 ** pow) * item )
pow: pow + 1

]
out
]


It can be used this way:

>> to-base58 115
== "2Z"
>> load-base58 "2Z"
== 115.0
>> to-base58 12345678911111
== "6Aipnrhc"
>> load-base58 "6Aipnrhc"
== 12345678911111.0


However decimal! type has "only" 15 digits of precision. When our short url software shorts the 999 trillionth page he can make some mistake. So we can analyze the number like integers of 9 digits at time:



REBOL [ Title: "Encode/Decode Base58"
Date: 5-Dec-2009
Author: ["Christopher Ross-Gill" "Massimiliano Vessi"]
File: %base58.r
Version: 1.0.2
Home: http://www.ross-gill.com/
Purpose: {
To Encode Integers as Base58.
Used by some URL shortening services. }
Example: [ browse join http://flic.kr/p/ to-base58 #2740009121 ]
]

url_short: func [ long [string!] ] [
replace/all long "0" "-"
count: 0
b: copy []
c: copy []
foreach item long [
either item <> #"-" [
append b item
count: count + 1
if count = 8 [
b: to-integer rejoin b
append c to-base58 b
count: 0
b: copy []
]
] [
if b <> [] [
b: to-integer rejoin b
append c to-base58 b
]
append c "-"
count: 0
b: copy []
]
]
if b <> [] [
b: to-integer rejoin b
append c to-base58 b
]

return rejoin c
]

to-base58: use [ch out][
ch: "123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"

func [id [number! issue!]][
id: load form id
out: copy ""
while [id > 0][
insert out ch/(round id // 58 + 1)
id: to-integer id / 58
]
out
]
]



load-base58: use [out ch os][
ch: "123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"
os: [
1 58 3364 195112 11316496 656356768 38068692544
2207984167552 128063081718016 7427658739644928
]

func [id [string! issue!]][
out: 0
foreach dg id [insert id: [] -1 + index? find/case ch dg]
forall id [out: os/(index? id) * id/1 + out]
return out
]
]

url_long: func [shorturl [string!]] [
count: 0
b: copy []
c: copy []
foreach item shorturl [
either item <> #"-" [
append b item
count: count + 1
if count = 5 [
b: rejoin b
append c load-base58 b
count: 0
b: copy []
]
][
if b <> [] [
b: rejoin b
append c load-base58 b
]
append c "0"
count: 0
b: copy []
]
]
if b <> [] [
b: rejoin b
append c load-base58 b
]
return rejoin c
]


It can be used this way, to avoid erros on number interpretation it must be writed as a string:

url_short "51333179229494856182760054086"
== "5x6yP32C5M4aMW--W-2u"
>> url_long "5x6yP32C5M4aMW--W-2u"
== "51333179229494856182760054086"

1 comment:

  1. Post was corrected, code rewritten to avoid mistakes with zeros.

    ReplyDelete