I wasn't sure where this thread belonged. I didn't see any subforums relating to more compsci topics.

Anyway, I've been reading up on data compression and I stumbled across this nifty little thing called delta code. Essentially, it's for encoding integers whose width can be decoded on-the-fly so you don't have to stick to predefine data widths. For large integers, the inflation is surprisingly small.

The way this guide spells it out, delta code looks like this:


Code:

Delta Code           Integer  Bits
----------------------------------
1                          1     1
010x                     2-3     4
011xx                    4-7     5
00100xxx                8-15     8
00101xxxx              16-31     9
00110xxxxx            64-127    10
00111xxxxxx          128-255    11
0001000xxxxxxx       256-511    14
0001001xxxxxxxx     512-1023    15
0001010xxxxxxxxx   1024-2047    16
0001011xxxxxxxxxx  2048-4095    17
...


Well, I was thinking about this for a little while, and I think I figured out how to reduce approx. half of all possible codes by 1 bit. Doesn't seem like a lot, but it could have a big effect with larger delta-coded-integer structures. Someone has -had- to have already figured this out, though. And it seems like anyone smart enough to invent delta code would have spotted it.

See where the bit length is increasing in the table, there's sort of a jagged edge? I figured if you decrease the size of the incrementing field (the part from the first 1 to the last bit before the x's), and just increase the number of leading 0's more often, you get a code that's both easier to decode (I think) and smaller by 1 bit for a large portion of possible codes.


Code:

Delta Code           Integer  Bits
----------------------------------
1                          1     1
010x                     2-3     4
011xx                    4-7     5
0010xxx                 8-15     7*
0011xxxx               16-31     8*
00010xxxxx            64-127    10
00011xxxxxx          128-255    11
000010xxxxxxx        256-511    13*
000011xxxxxxxx      512-1023    14*
0000010xxxxxxxxx   1024-2047    16
0000011xxxxxxxxxx  2048-4095    17
...


You can see that the width of every other set of two encoding widths (excluding delta code 1) is reduced by 1. That means on average, a structure of, say, 1024 bytes of delta encoded integers using this modified method can be reduced by about 64 bytes. Not much, but not nothing. Notice how the modified set has a steeper slope in width but eliminates those "jagged edges." I hope that makes sense.

Anyway, just thought I'd share that with you guys. Tell me what you think. (:

---

Just realized I might be off, and instead of over other group of two, it might be more like every 2^n group of two. Maybe not. Not enough time to figure it out right now. Night.
The slight advantage of your modified approach for small comes at the expense of exponentially larger codes as you get past those you listed:

Code:
Code         \       Length
0001010xxxxxxxxx                16
0001011xxxxxxxxxx               17
0001100xxxxxxxxxxx              18
0001101xxxxxxxxxxxx             19
0001110xxxxxxxxxxxxx            20
0001111xxxxxxxxxxxxxx           21
000010000xxxxxxxxxxxxxxx        24

The same range with your modification:

Code:
0000010xxxxxxxxx                16
0000011xxxxxxxxxx               17
00000010xxxxxxxxxxx             19
00000011xxxxxxxxxxxx            20
000000010xxxxxxxxxxxxx          21
000000011xxxxxxxxxxxxxx         22
0000000010xxxxxxxxxxxxxxx       24

The small benefit becomes zero beyond code length of 16, as far as I can tell.

The interval between leading zero length increases is constant with your modification, while its behavior is exponential in the usual encoding. Thus, your coding is only slightly more efficient in the case where all deltas are small and becomes exponentially worse as deltas increase.
Ah, so it's better for encoding smaller integers. That makes sense. I thought something was fishy since the original doesn't seem to 'waste' any bits. This is still useful to know.
Progbeard wrote:
Ah, so it's better for encoding smaller integers. That makes sense. I thought something was fishy since the original doesn't seem to 'waste' any bits. This is still useful to know.
That's definitely good to know, and I think it would be interesting things to keep in mind for future compression attempts.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement