There's this routine from z80 heaven that calculates l mod 3:

Quote:

Code:
;Outputs:
;     HL is preserved
;     A is the remainder
;     destroys DE,BC
;     z flag if divisible by 3, else nz
     ld bc,030Fh
;Now we need to add the upper and lower nibble in a
     ld a,l
     and c
     ld e,a
     ld a,l
     rlca
     rlca
     rlca
     rlca
     and c

     add a,e
     sub c
     jr nc,$+3
     add a,c
;add the lower half nibbles    ;at this point, we have l_mod_15

     ld d,a
     sra d
     sra d
     and b
     add a,d
     sub b
     ret nc
     add a,b
     ret
;at most 117 cycles, at least 108, 28 bytes


It always takes at least 108 cycles, and it's hard for me to believe that a routine that has only three possible outputs can take that long. Since I'm new to assembly, I've been unable to save a single cycle off of this without a LUT. Is there a significantly faster mod-3 routine for the z80 in, say, less than 64 bytes?
Just a warning, i suck at math, so i'm not sure how much of a help i can be. I often get around multiplication routines by doing a series of additions, i wonder if you couldn't use a series of subtractions to do something similar here? Subtracting by multiples of threes, something like:

Code:
      sub 144
    jr nc,$+4
      add a,144
   sub 72
    jr nc,$+4
      add a,72
   sub 36
    jr nc,$+4
      add a,36
   sub 18
    jr nc,$+4
      add a,18
   sub 9
    jr nc,$+4
      add a,9
   sub 6
    jr nc,$+4
      add a,6
   sub 3
    jr nc,$+4
      add a,3
Though counting the clocks that's a bit slower (133-147 cycles), maybe someone else can pull some z80 magic to help you out.

EDIT: Perhaps you could combine my method with the LUT method:

Code:
calculateMod3:
   sub 144
    jr nc,$+4
      add a,144   ;19/21
   sub 72
    jr nc,$+4
      add a,72   ;38-42
   sub 36
    jr nc,$+4
      add a,36   ;57-63
   
   ld l,a         ;61
   ld h,0         ;68
   ld de,mod3LUT   ;78
   add hl,de      ;89
   ld a,(hl)      ;96-102
   ret
   
mod3LUT:
.db 0,1,2   ;3
.db 0,1,2   ;6
.db 0,1,2   ;9
.db 0,1,2   ;12
.db 0,1,2   ;15
.db 0,1,2   ;18
.db 0,1,2   ;21
.db 0,1,2   ;24
.db 0,1,2   ;27
.db 0,1,2   ;30
.db 0,1,2   ;33
.db 0,1,2   ;36
I found a solution, 22 bytes and ~87cc average. This relies on the fact that X%3 = (X//2 - X%2)%3. First I add 256 using scf to ensure that sbc never sets the carry flag, then I do six iterations of rra \ sbc a,0 to reduce the result to a number between 2 and 7 inclusive. That is one less than the correct value, so I start by incrementing A to compensate. I still don't think this is optimal.


Code:
;a = a mod 3
;b destroyed
A_mod_3:
   ld b,0         ;7   07
   inc a         ;4   11
   ret z         ;5  16    ;if A was 255, A mod 3 = 0
   scf            ;4   20     ;otherwise, inc and calculate mod below
   rra \ sbc a,b   ;8   28
   rra \ sbc a,b   ;8   36
   rra \ sbc a,b   ;8   44
   rra \ sbc a,b   ;8   52
   rra \ sbc a,b   ;8   60
   rra \ sbc a,5   ;11   71
   ret nc         ;~3   74
   add a,3         ;~3   77
   ret            ;10   87


If we eliminate the "ret z", it's 5cc faster, but doesn't work for A=255.

EDIT: I'm pretty sure that fewer than 7+51+10=68 cycles is impossible without a LUT, because it seems that six iterations of *something* are required to get the numbers within a range, and each iteration should take at least 8 cycles (because I can't find any single instructions that can do an iteration) and one iteration probably requires something in a register, too, which must be loaded. However, it may be possible to load something other than zero into b, possibly saving somewhere else.

Also, I think "inc a \ ret z \ scf" can be "sub $FF \ ret z", which still sets carry and saves one cycle. But I haven't tested yet.
I'm unsure about this, but maybe you could do something with the parity flag? Modulo 3 is counting the 1's at bit 7,5,3,1 - counting the 1's at bit 6,4,2,0 - so I thought you COULD do something with the parity flag, but I really doubt it Very Happy
My (untested) solution would be something like this:

ld b,a
rrca
rrca
rrca
rrca
and 00001111b
loopA:
sub 3
jp nc,loopA
add a,3
ld c,a
ld a,b
and 00001111b
loopB:
sub 3
jp nc,loopB
add a,3
add a,c
sub 3
ret nc
add 3
ret

This is probably far from the best method, ile calculate how many cycles it takes...
I have another more elegant mehod, but i dont think its as fast (or small)
Edit: The code above is quiet slow. I blame JP for taking 10 cycles Razz
Popcount (Hamming weight) is also possible by shifting the bits one at a time, and I can't seem to make the clever masking algorithms for it work in fewer cycles. However, with popcount, a different register must be used for the shifting (a value cannot accumulate in a itself), meaning there's extra cost from "srl" instead of "rra". The below is untested, and probably also not optimal if it even works.


Code:
;Hamming weight of L
;Output in A
;Destroys L,B
;102 tstates including "ret"
xor a
ld b,a
srl l     ;bit 7 of L now in bit 6, bit 0 in carry
adc a,b
srl l
adc a,b
srl l
adc a,b
srl l
adc a,b
srl l
adc a,b
srl l
adc a,b
srl l
adc a,l    ; add both bits 7 and 6 at once
ret
The two bit (base-4) digit sum used in the first routine (from z80 heaven) can be optimized:

The masking of nibbles and half nibbles (two bit digits) can be removed, and replaced with "and 3" at routine end.
Removing this masking also simplifies the overflow handling, since the carry flag (CF) is now set when the digits overflow, and CF must be added back to the byte to complete the digit sum.
An increment, "ret z", and then a "dec a" before the routine returns, changes the two bit digit sum to 0 when it is 3.

The routine works for all 0-255 (the z80heaven routine overflows for 255), is 19 bytes, and 65cc for nonzero numbers divisible by 3, else 80cc, for an average 75cc.

Code:
;Inputs:
;  A  unsigned integer
;Outputs:
;  A = A mod 3
;  Z flag is set if divisible by 3
;Destroys:
;  C
; 19 bytes,  ~75cc (65 or 80)
bMod3:
   ld c,a                     ;add nibbles
   rrca / rrca / rrca / rrca
   add a,c
   adc a,0                    ;n mod 15 (+1) in both nibbles
   ld c,a                     ;add half nibbles
   rrca / rrca
   add a,c
   adc a,1
   ret z
   and 3
   dec a
   ret


This method is even better for larger sized numbers, since you halve the size for each iteration.
The 16-bit version add 4 bytes (23cc), and only 5 bytes (34cc) is added for the 32-bit version.

A mod 15 is calculated at the nibble addition. The 32-bit mod 15 routine is 18 bytes, and 68cc for nonzero numbers divisible by 15, else 83cc, for an average 82cc.

Code:
;Inputs:
;  DE:HL  32-bit unsigned integer
;Outputs:
;  A = DEHL mod 15
;  Z flag is set if divisible by 15
;Destroys:
;  HL
; 18 bytes,  ~82cc (68 or 83)
dwMod15:
   add hl,de                  ;add words, n mod 65535 (+1)
   ld a,h                     ;add bytes
   adc a,l
   adc a,0                    ;n mod 255 (+1)
   ld l,a                     ;add nibbles
   rrca / rrca / rrca / rrca
   add a,l
   adc a,1                    ;n mod 15 (+1) in both nibbles
   ret z
   and 0Fh
   dec a
   ret


A 32-bit mod 5 (20 bytes), and a 32-bit mod 10 (25 bytes) can be based on this mod 15 routine, but that is probably off-topic.

Note: These routines are tested in x86 assembly and translated to z80. I hope the translation is correct.
That is fantastic. I can't find any issues with it, thanks!
It's good to know that the routines work in a z80 environment. Smile
I have posted the modulo 10 routines mentioned in a new thread.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement