Ok, so here's my problem. This is the critical point of development of StarCalc. If I find a way to fit it in, then it might work all (if not, there's no point of continuing the whole)(at like 3 frames/s if it works), considering the timings I gave are the worst possible in each condition (the average would then be "X compare only" timing). The ways of improving the speed would be to :
- reduce number of searches per frame (now 25/frame, meaning all units searched in 8 frames)
- unroll the routine (that's no fun on memory, but can save a max of 196500 T states/frame over all timings)
- optimizing

You guys come in for the 3rd point. Any clock you can get saves 25*600 clocks overall. So 15000 T/clock removed. You have a relatively low amount of RAM (max ~1k), but still, and all the registers you may think of, even IX, IY and shadows and SP and R (except I, for interrupt speed regulation). You may change inputs and data structure.


Routine all in RAM

Data structure: Type (below 64=ground, upper=air),internal(7)
xcoordinate, ycoordinate, xtarget, ytarget, internal(4 bytes)

Timings: (400 units and 200 buildings, let's say all marines+turrets, so 600 searches)
-not in type search range
101+1200
780 600 T states (7.6 FPS)
-x compare only
101+2200
1 380 600 T states (4.3 FPS)
-x and y compare
101+3650
2 225 600 T states (2.6 FPS) (and that's only with target search)


Code:
;Inputs:
;  -(DE)    =xcoordinate byte of the unit that's searching a target
;  -C reg8  =range of search+2
;  -A reg8  =24 for all, 48 for air, 56 for ground
;  -(Start) =location to start search, because we search 25/frame only

;Range explanation:
;
;X * * * * Y * * *
;for Y to be in range of X, normal range (without the +2 offset) shall be 5, so input=7


Target_search:
        ld (loop+6),a;13
        ld HL,(Start)  ;16
        ld b,25        ;7
        push de        ;11
        exx            ;4
        pop de        ;10
        inc de        ;6
        exx            ;4
        ld (SaveSP),SP;20
        ld SP,8        ;10;intialize=101T
loop:   
        ld a,(HL)        ;7
        add hl,sp        ;11
        or a              ;4
        jr z,next_unit  ;12/7
        cp 64            ;7
        jr next_unit  ;12/7  ;smc effectued here for condition
        ld a,(de)        ;7  ;7        ;7        ;7
        sub a,(hl)      ;7  ;7-7=0(nc);7-3=4(nc);7-11=-4 /252 (carry)
        jr nc,$+2+2      ;12/7;jump      ;jump      ;not jump
        neg              ;8  ;          ;          ;+4
        cp C            ;4  ;cp 5 (r+2);cp 5      ;cp 5
        jr nc,next_unit+1;12/7;carry    ;carry    ;carry

        inc hl          ;6
        exx              ;4
        ld a,(DE)        ;7
        exx              ;4
        sub a,(hl)      ;7
        jr nc,$+2+2      ;12/7
        neg              ;8
        dec hl          ;6
        cp C            ;4
        call c,target_found;17/10
       
next_unit:
        add HL,SP        ;11
;repeat 25 times


Thanks in advance for your help!
whats your current datastructure?

whats the max number of units you'll need to fit and in how much ram? finally, how much ram does a unit currently take up for state storage?

finally, I'm feeling lazy, and hate reading other people's ASM code, so please explain how your current algorithm works (and is it a box search or a radius search? I'm assuming box atm)
Before I dive into optimization (I'd also like to hear what elfprince13 requested), I'd like to point out some flaws in this that you may not be aware of:
Fallen Ghost wrote:
You guys come in for the 3rd point. Any clock you can get saves 25*600 clocks overall. So 15000 T/clock removed. You have a relatively low amount of RAM (max ~1k), but still, and all the registers you may think of, even IX, IY and shadows and SP and R (except I, for interrupt speed regulation). You may change inputs and data structure.


1) SP - it looks like you've got the whole saving SP thing down, but a reminder to optimizers: if you use SP, you can't push and pop anymore.
2) R - you can't set R. You can only read it, and it's a semirandom number
3) Shadows - you say shadows can be used, yet you say that I must remain intact, suggesting that you do indeed have an interrupt running. I assume it's the grayscale interrupts, if you're using GS? You can only use one or the other, shadows or interrupts. If there's an interrupt running, you must DI before you can use the shadows.

I'm fairly confident you already knew all that and relied on the optimizers to know it as well; just in case, for them or you, here it is. Smile
KermMartian wrote:
2) R - you can't set R. You can only read it, and it's a semirandom number

<ot>semirandom or pseudorandom? if semirandom how's it determined?</ot>
R stands for 'refresh' - it counts clock cycles passeed for RAM that needs refreshing every N clock cycles. Therefore, semirandom (it looks random, but follows a pattern) as opposed to pseudorandom (is supposed to look random, does look random, but can be predicted given the algorithm).
ohhai, I see he did post the datastructures.
my recommendation would be adding 2 bytes to each unit for left and right child and storing it in a binary search tree organized by (x,y) (so when adding a new node only check y if the x's are equal). another way of saying this is that set it up so that if you used treesort on an unsorted listed of units with the following coordinates

(1,2)
(3,5)
(4,9)
(3,7)

you would get
(1,2)
(3,5)
(3,7)
(4,9)


this is probably the most efficient way of doing it, and will cut down dramatically on the number of units you need to process while searching.
Thanks alot guys for the answers! Smile

Oh, Kerm, what I meant by "you can use all the registers you may think of" is that this code is the main time-eater, so the rest will be built on whatever else I have. And interrupts are used for timing regulation only, no GS.

And note that I don't currently use this code, so I haven't noticed the call thing, but I'll change it.

ASM in 28, Z80 Instruction Set wrote:
LD { I | R },A
Operation Stores the value of A into register I or R.
Op Code 11101101 : 0100[reg]111 Register Bit Field
I 0
R 1

T States 9


@elfprince: each unit is 16 bytes, for a total of 400 units and 200 buildings that may or may not search other enemies. For a total of 9600 bytes.

My code is disposed so that each unit type has it's own routine (jump tables), so it runs through each unit and if needed, will search a target if necessary.

How could one implement a binary search tree for 600 entities that goes faster than 3 Mil clocks (1.5 for target search and 1.5 for all splash) and that also keeps track of what unit is the enemy? (well, xtarget and ytarget become the 16 bit relative pointer to the xcoord and y coord, to follow/kill the unit and deal damage)
Oh god. And now I think about splash attack (same exact problem, but add it to this one)... Your idea looks like great help in my opinion.

It would most likely look like this:

300*4 bytes (4x because of double-buffering and ally/enemy), aka 4x1280 bytes aligned areas (displacing TI-OS code to obtain space could be a good idea, though taking ~22k of RAM)

Code coming up, I done storing and halfway searching, the only prob now is sorting. Does anyone has a routine to store things out? Elf already talked of it on Omni, but can't figure out where.
it really shouldn't take that many clock cycles to traverse a binary search tree with 600 objects. have you ever implemented one before?



and what needs to be sorted other than the units? by nature of how a BST works, inserting objects into it will place them in the tree so that an in-order traversal returns them in sorted order.


if you do need to implement some other sorting algorithm, I suggest MergeSort.
Regarding ld r,a - yes, but the CPU changes R on its own at every cycle. R doesn't stay the same.
KermMartian wrote:
Regarding ld r,a - yes, but the CPU changes R on its own at every cycle. R doesn't stay the same.


Well, I though maybe one of you guys might find some utility in storing to it to check cycles and use conditionals or whatever, even if I don't see how

elfprince13 wrote:
it really shouldn't take that many clock cycles to traverse a binary search tree with 600 objects. have you ever implemented one before?

and what needs to be sorted other than the units? by nature of how a BST works, inserting objects into it will place them in the tree so that an in-order traversal returns them in sorted order.

if you do need to implement some other sorting algorithm, I suggest MergeSort.


I have never implemented a binary search tree before, but I'll look in your mergesort file on TiCalc. After looking at it, I'm no sure about how it works (the beginning of it, the end I understand data movement).

"Shouldn't take many clocks to traverse a BST with 600 elements". By searching, do you mean loop and check until x is over the lower bound, then check y and if x i over the upper bounds?

Should I have something like a small data section that holds place of every x number by jumps of 10 (like store where 0 is, then where 10 is, then 20 and so on) to facilitate search?

What need to be sorted? 2xthe 300 4 byte entries in the table

[Edit]Oh great, the best possible worst time is nlog(n), so n*2.5
[Edit2]Sigma made a heapsort algorithm for the Z80, using no other memory, I'm modding it to see the time it will take to sort.
[Edit3]Found out I can kill off 526 200 clocks off (1/12 of a second) in my first routine by changing data structure and unrolling it all (I mean, totally, the 25 iterations)
Also found out I can kill out 165000 if I can figure out a way of not having to check if there is a unit or not
KermMartian wrote:
Regarding ld r,a - yes, but the CPU changes R on its own at every cycle. R doesn't stay the same.

The CPU increments R every refresh, but it never changes bit 7, which could be used for some sort of flag.
The Tari wrote:
KermMartian wrote:
Regarding ld r,a - yes, but the CPU changes R on its own at every cycle. R doesn't stay the same.

The CPU increments R every refresh, but it never changes bit 7, which could be used for some sort of flag.
Ah, true enough, bit 7 could indeed be used as a flag in this (or actually any) situation. I need to remember that one...
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement