Compression algorithm

» Forum > Calculator Programming > TI-BASIC

»
» View previous topic :: View next topic

_player1537
Guru-in-Training (Posts: 2983)

Compression algorithm
15 Jan 2010 08:52:03 pm

I am wanting to make a program that lets you compress strings to a smaller size. I have already written a portion of the code, mainly just the part that checks which string will get the most words on it. I will try to post my code later, when I get back to my house. If anyone has any questions or comments they will be much apreciated. Why am I making this you might ask (which you will Razz

), mainly I got bored and thought that this would be a good thing if I were able to make a text compressor, when finished I hope it will be able to accept a whole story (or at least 5-10 paragraphs) of writing. The numbers might be exagerated, I don't know. Either way thats my reason. Obe nire thing, how would I be able to send an exceptionally large string to the calculator ram (25k+ bytes) and then view only a portion of it.

tifreak8x
Guru (Posts: 11009)

15 Jan 2010 09:27:19 pm

sadly, you cannot exceed the RAM available with a string and have it readable.

However, you can have it in a program that is left in archive, and use Celtic2 to read from it.

_player1537
Guru-in-Training (Posts: 2983)

16 Jan 2010 11:14:33 am

thanks, when I use celtic2 does it leave all the program data in string form like if I had a program that said
Code:

program:hello


:disp"HELLO WORLD"

would a string show up that had all of that, and if so what would it do to the quotes? on a side note, I have finished a program that converts decimal to ternary, and decimal to binary Razz

EmperorWiggy
Advanced Newbie (Posts: 59)

16 Jan 2010 11:31:26 am

It would return the string

Code:




Disp "HELLO WORLD"

And I would look into using Celtic 3, it might be useful for a compression algorithm.

Will_W
Expert (Posts: 529)

16 Jan 2010 12:19:33 pm

you could compress it before you sent it and have a program to read a segment of the compressed string

rthprog
Expert (Posts: 726)

16 Jan 2010 05:50:01 pm

why not just limit the character set and change the base to compress? Would let you use all calc commands as characters too!

_player1537
Guru-in-Training (Posts: 2983)

16 Jan 2010 07:23:14 pm

rthprog wrote:

why not just limit the character set and change the base to compress? Would let you use all calc commands as characters too!

sorry I don't understand what you mean, could you clarify?

KermMartian
Site Admin (Posts: 64060)

16 Jan 2010 09:47:50 pm

_player1537 wrote:

rthprog wrote:

why not just limit the character set and change the base to compress? Would let you use all calc commands as characters too!

sorry I don't understand what you mean, could you clarify?

If you only allow A-Z, space, 0-9, period, and comma, for example, then you have 26+1+10+1+1=39 characters to encode. The alpha keypad plus menus and catalog of the TI-83/+/SE/84+/SE series allows you to type many times more than that. If, for example, there are 120 unique tokens (caveat: random number chosen for the sake of example) available (including things like If, Xmin, i, etc, basically anything you can put into a string), you'd be able to encode, perhaps, the 39 single characters allowed in the first 39 symbols, then use the remaining 81 to encode the most common combinations of symbols. Say we have the following sentence:

Code:

A SPOONFUL OF SUGAR SOMETIMES SOOTHES SORE THROATS.

The sequence "[space]S" is repeated five times. If we prepend a dictionary containing "Then[space]S+" to the string, to indicate that the token "Then" should be expanded to "[space]S", and then the + indicates a delimiter (ie, end of dictionary), then our string could be compressed as (with dictionary prepended):

Code:

Then S+AThenPOONFUL OFThenUGARThenOMETIMESThenOOTHESThenORE THROATS

We saved five bytes in the string, and our dictionary is four bytes, so our final string saved a single byte. Needless to say (maybe not needless to say?), the longer the sample text, and the more repetitions, the better the compression gets. Take the longer sentence, stolen from the Cemetech front page:

Code:

AFTER THE SURPRISING SUCCESS OF DHCPGEN AS BOTH A TOOL FOR USE IN MY SCHOOLS EE DEPARTMENT AND AS A TOOL AVAILABLE HERE, IVE BEEN WORKING ON A NEW MULTISERVER MANAGEMENT PROGRAM CALLED SYSMON.

Although a program would be able to do a more thorough job of this than me, a cursory examination shows me that "[space]A" appears no less than six times, and "MENT[space]" appears twice. I will define cos( as "[space]A" and Float as "MENT[space]", then I will have the following dictionary:

Code:

cos( A-FloatMENT +

where - separates dictionary entries and + ends the dictionary. The encoded string:

Code:

AFTER THE SURPRISING SUCCESS OF DHCPGENcos(S BOTHcos( TOOL FOR USE IN MY SCHOOLS EE DEPARTMENTcos(NDcos(Scos( TOOLcos(VAILABLE HERE, IVE BEEN WORKING ONcos( NEW MULTISERVER MANAGEMENT PROGRAM CALLED SYSMON.

We save seven bytes with the cos( replacement, and the dictionary entry takes four bytes, so that compression saves us three bytes. However, you will see that the Float replacement was almost unnecessary: it saves eight bytes, but the longer replacement string means the dictionary entry takes seven bytes, so it only saves one byte overall. The resultant string, including its dictionary, is thus four bytes shorter than the original.

Does this make sense?

Weregoose
Cemetech Expert (Posts: 573)

17 Jan 2010 09:19:09 am

I was working with Markov chains and a dictionary of 264060 English words a short while ago, and one of the by-products was a list of the 676 possible A-to-Z letter pairings, ordered by their frequency. (Unfortunately, I only allowed up to one "hit" per word, so the data might be slightly skewed for the purposes of a compression algorithm.)

Note that the calculator holds both one- and two-byte tokens. Inherently, the latter would not be suitable for compressing anything less than three bytes. 240 of the single-byte tokens can be written without ASM (and setting aside the double quote, as it can't normally be read from a string). Therefore, a bijection of 240 pairings is ideal. Here are those that ranked the highest:

ES IN ER TI AT TE ON IS RE NG EN ST ED AL RI LI RA LE AN NE SE AR IC OR RO NT IT CO IO IE SS LA DE NS CA RS NI TR TA DI AS HE TO UN ME CH LL LO OL EL ET OU SI MI MA PE IL VE LY NA AC US OM IA EA TH HI UR HO ND TS OS EC NO CE PH UL PR HA NC GE OP SH AB PO PA OT CI MO OG EM BL AM PI ID AP IZ UT CT SC SA BI AD GI SU OC SP BE IM BA OO SO CR CK AG UM EE RT GR KE EP IV SM IR FI BO ZE AI MP GA IG VI TU PL DO CU TT OD LU DA IP RU GS RM OV HY MS LS DS TY QU RY KI BR IF RR RD WA FO AU OW FE GL CL RC EX OI VA BU UP DR MB UB UC GO EO OB FL PS EG IB RN UI MU UA EI PP FA FU EF MM WE PT NN LT WI GU PU WO KS SL FF OA EV DU UE YS HR AV GH RP SN RG TL RB UD EU YP ZI NU ZA FR EB AK SY YL AY EW AE DL NF HU GN GG OE RL NK HT UG LD

This all assumes a rather simplified set of items to replace. If "RING" happens to be more prevalent than "LD" (and it is), then it still hasn't been accounted for. The solution would be to go through all permutations of any number of letters to find the choicest combinations, and those strings having just four characters in length would be in a field of 456976 possibilities. Happy hunting!

_player1537
Guru-in-Training (Posts: 2983)

17 Jan 2010 10:13:56 am

Thank you both, that was very helpful. My original program was based on using a dictionary based on the string itself and not a combination of every string. I will get working on this soon. As for the size of the string and such, would it be easy (it seems like it would) to decompress one section of the string?

Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page

Compression algorithm

Reply to topic

»
» View previous topic :: View next topic

Page 1 of 1

» All times are UTC - 5 Hours

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum