Join us on Discord!
You can help CodeWalrus stay online by donating here.

FastClr routine : a very fast way to clear screen !!!

Started by unregistered, June 11, 2016, 09:08:29 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

unregistered

Hello there!!

While on CodeWalr.us chat, PT_ and I thought about a way to clear screen of a TI83PCE/TI84+CE as fast as possible! (in 8bpp mode)

Here's the result :

FastClr:
        ld      de,$555555      ; will write byte 85 (= blue color)
        or      a
        sbc     hl,hl
        ld      b,217
        di
        add     hl,sp           ; saves SP in HL
        ld      sp,vram+76818   ; for best optimisation , we'll write 18 extra bytes
ClrLp:  .fill 118,$d5           ;       = 118 * "PUSH DE"
        djnz    ClrLp           ; during 217 times
        ld      sp,hl           ; restore SP
        ei


16+4+8+8+4+4+16+217*(118*10+13)-5+4+4=258944 States !!!  ;D
(the classic LDIR takes about 537600 states)

Imagine this routine relocated in the faster memory-area $e30800 !!! (faster again !!)


** EDIT **


A little faster !

FastClr:
        ld      de,$555555      ; will write byte 85 (= blue color)
        or      a
        sbc     hl,hl
        ld      b,213
        di
        add     hl,sp           ; saves SP in HL
        ld      sp,vram+76800   ; as a PUSH is decreasing SP, begin at end of 8bpp mode physical screen
ClrLp:  .fill 120,$d5           ;       = 120 * "PUSH DE"
        djnz    ClrLp           ; during 213 times
        .fill 40,$d5            ; 40 * "PUSH DE"
        ld      sp,hl           ; restore SP
        ei


16+4+8+8+4+4+16+213*(120*10+13)-5+40*10+4+4 = 258832 States =D

TheMachine02

Indeed usign push/pop is the fastest way possible, but it is also very large. This trick was already used in the z80 area - for filling, clearing or everything else. The drawback is that interrupt is disabled, but it isn't a huge issue. Actually, the fastest way ever would require 25600 bytes  :P (but it is already good like this, relatively small footprint at ~170 bytes, vs less than 10 for ldir).

Dream of Omnimaga

Hm I am curious about if this would be a viable replacement for the clear screen routine in Sprites and the C libraries? Better speed is always better but I am curious about if this would increase the libs size? Nice work regardless :)
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

Adriweb

Probably not, since it disables interrupts, and lib functions are interrupt-safe.
However, for programmers using ASM directly in their project and already manually handling interrupts, well... :)

(BTW grosged, Runer said push is 10 states, not 12)
  • Calculators owned: TI-Nspire CX CAS, TI-Nspire CX, TI-Nspire CAS (x3), TI-Nspire (x2), TI-Nspire CM-C CAS, TI-Nspire CAS+, TI-80, TI-82 Stats.fr, TI-82 Plus, TI-83 Plus, TI-83 Plus.fr USB, TI-84+, TI-84+ Pocket SE, TI-84+ C Silver Edition, TI-84 Plus CE, TI-89 Titanium, TI-86, TI-Voyage 200, TI-Collège Plus, TI-Collège Plus Solaire, 3 HP, some Casios
Co-founder & co-administrator of TI-Planet and Inspired-Lua

Dream of Omnimaga

  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

ben_g

Quote from: TheMachine02 on June 11, 2016, 02:45:32 PM
Actually, the fastest way ever would require 25600 bytes  :P

Do you mean something like this?

ClrVeryFast:
  ld hl, 0
  ld (plotsscreen), hl
  ld (plotsscreen+2), hl
  ld (plotsscreen+4), hl
  ld (plotsscreen+6), hl
  ld (plotsscreen+8), hl
  ld (plotsscreen+10), hl
  ...
  ld (plotsscreen+764), hl
  ld (plotsscreen+766), hl
  ret

Dream of Omnimaga

Wait, are loops actually this much slower in ASM too? O.O I thought that was just a TI-BASIC-specific flaw O.O
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

novenary

Loop unrolling is a common trick to gain speed at the cost of size since you spend less time decrementing, comparing and jumping.

ben_g

Loops are not that slow in ASM, but loops cause overhead in every language. The speed difference may not even be noticable and in this case it's deffinately not worth the additional memory requirements, but it is technically faster.

aetios

Well, it doesn't have to jump every time and calculate which loop it is on. Instead everything is hardcoded.
ceci n'est pas une signature

Dream of Omnimaga

Ah I see. I just thought it was TI sucking <_<

This is why the 83+ version of GalagACE used 12 Output commands to draw 12 ships instead of two For loops and 1 Output command.
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

unregistered

#11
Ah yes, Push does not take 12 but only 10 !! (I've checked)
Thanks, Adriweb...and Runer ;)

I also modified "PUSH IX/IY" which takes 14 states (not 16)

tr1p1ea

Has this actually been timed on calc? The ez80 'sort of' has some pipelining features that could introduce some benefits for certain instruction combinations.

unregistered

#13
This morning, I've just manually measured both methods : "LDIR" and "PUSH"
I used http://online-stopwatch.chronme.com/ , my TI83PCE (freshly "Ram cleared", unplugged)

Here are the 2 programs to clear screen during 10 000 times !

First, the classic method "LDIR"...

        ld              a,$27
        ld              ($e30018),a

        ld              bc,10000
BigLp:  push    bc
;----------------------------------------------------------------
        ( di )
        ld              hl,$d40000
        ld              de,$d40001
        ld              (hl),85
        ld              bc,76799
        ldir
        ( ei )
;-----------------------------------------------------------------
        pop     bc
        dec     bc
        ld              a,b
        or              c
        jp              nz,BigLp

        ld              a,$2d
        ld              ($e30018),a
        ret


which takes (with or without interrupts!)  1 minute and 59 seconds



Then, the method "PUSH" ...

        ld              a,$27
        ld              ($e30018),a
       
        ld              bc,10000
BigLp:  push    bc
;-----------------------------------------------------------------------
        ld      de,$555555      ; will write byte 85 (= blue color)
        or      a
        sbc     hl,hl
        ld      b,213
        di
        add     hl,sp           ; saves SP in HL
        ld      sp,vram+76800   ; begin at end of 8bpp mode physical screen
ClrLp:  .fill 120,$d5           ;       = 120 * "PUSH DE"
        djnz    ClrLp           ; during 213 times
        .fill 40,$d5            ; 40 * "PUSH DE"
        ld      sp,hl           ; restore SP
        ei
;------------------------------------------------------------------------
        pop     bc
        dec     bc
        ld              a,b
        or              c
        jp              nz,BigLp

        ld              a,$2d
        ld              ($e30018),a
        ret


which takes ... 58 seconds !!!   ;D

And if we relocate the main routine in $e30800, time will decrease to 51 seconds !!!

aetios

ceci n'est pas une signature

Powered by EzPortal