* WalrusIRC

You need to have 5 posts and not be part of restricted usergroups in order to use the WalrusIRC embedded shoutbox. However, you can also access our IRC channel called #CodeWalrus via EFnet.

Author Topic: FastClr routine : a very fast way to clear screen !!!  (Read 1534 times)

0 Members and 1 Guest are viewing this topic.

Offline grosged

  • Full User
  • Join Date: May 2016
  • Location:
  • Posts: 63
  • Post Rating Ratio: +2/-0
    • grosged
    • View Profile
  • Gender: Male
FastClr routine : a very fast way to clear screen !!!
« on: June 11, 2016, 09:08:29 am »
Hello there!!

While on CodeWalr.us chat, PT_ and I thought about a way to clear screen of a TI83PCE/TI84+CE as fast as possible! (in 8bpp mode)

Here's the result :

Code: [Select]
FastClr:
        ld      de,$555555      ; will write byte 85 (= blue color)
        or      a
        sbc     hl,hl
        ld      b,217
        di
        add     hl,sp           ; saves SP in HL
        ld      sp,vram+76818   ; for best optimisation , we'll write 18 extra bytes
ClrLp:  .fill 118,$d5           ;       = 118 * "PUSH DE"
        djnz    ClrLp           ; during 217 times
        ld      sp,hl           ; restore SP
        ei

16+4+8+8+4+4+16+217*(118*10+13)-5+4+4=258944 States !!!  ;D
(the classic LDIR takes about 537600 states)

Imagine this routine relocated in the faster memory-area $e30800 !!! (faster again !!)


** EDIT **


A little faster !

Code: [Select]
FastClr:
        ld      de,$555555      ; will write byte 85 (= blue color)
        or      a
        sbc     hl,hl
        ld      b,213
        di
        add     hl,sp           ; saves SP in HL
        ld      sp,vram+76800   ; as a PUSH is decreasing SP, begin at end of 8bpp mode physical screen
ClrLp:  .fill 120,$d5           ;       = 120 * "PUSH DE"
        djnz    ClrLp           ; during 213 times
        .fill 40,$d5            ; 40 * "PUSH DE"
        ld      sp,hl           ; restore SP
        ei

16+4+8+8+4+4+16+213*(120*10+13)-5+40*10+4+4 = 258832 States =D
« Last Edit: June 11, 2016, 08:16:07 pm by grosged »



Offline TheMachine02

  • Full User
  • Join Date: Dec 2014
  • Location:
  • Posts: 313
  • Post Rating Ratio: +13/-0
    • View Profile
Indeed usign push/pop is the fastest way possible, but it is also very large. This trick was already used in the z80 area - for filling, clearing or everything else. The drawback is that interrupt is disabled, but it isn't a huge issue. Actually, the fastest way ever would require 25600 bytes  :P (but it is already good like this, relatively small footprint at ~170 bytes, vs less than 10 for ldir).

Offline DJ Omnimaga

  • Omni founder & CW co-founder
  • CodeWalrus Staff
  • Super User
  • Forum Maintenance
  • Original 5
  • CodeWalrus Supporter
  • *
  • Topic Management
  • Join Date: Nov 2014
  • Location: Quebec, Canada
  • Posts: 17468
  • Post Rating Ratio: +83/-4
    • dj_omnimaga
    • DJOmnimaga.music
    • @DJOmnimaga
    • dj_omnimaga
    • @DJOmnimaga
    • /u/DJ_Omnimaga
    • DJOmnimaga
    • 112/11286
    • @djomnimaga
    • @DJOmnimaga
    • View Profile
    • DJ Omnimaga music store
  • Gender: Male
Hm I am curious about if this would be a viable replacement for the clear screen routine in Sprites and the C libraries? Better speed is always better but I am curious about if this would increase the libs size? Nice work regardless :)
  • Calculators owned: TI-73, TI-80 (broken), TI-81, TI-82, TI-83, TI-83+ (broken), TI-83+ (broken), TI-83+SE (broken), TI-84+, TI-84+CSE, TI-84+CE, TI-85, TI-86, TI-89T, TI-92, TI-Nspire, TI-Nspire CX, HP 39gII, HP Prime, Casio fx-7000G, fx-7400G+, fx-7700GE, fx-9750G+, fx-9750GII, fx-9860G, cfx-9850G, FX-1.0+, fx-CG10, fx-CP400
  • Consoles, mobile devices and vintage computers owned: Samsung i5510, Nexus 5, Atari 2600, Lynx, SMS, Game Gear, Genesis, Dreamcast, NES, SNES, N64, GCN, Wii, Wii U, GBA, DS, 3DS, PS2, PS3, PS4, PSP, PSVita, XBox 360, XBOne

Bandcamp|Reverbnation|Facebook|Youtube|Twitter
Retired Omnimaga admin (2001-11) and editor (2012-14)
??? ??? ??? ???

Offline Adriweb

  • Full User
  • CodeWalrus Supporter
  • *
  • Join Date: Jan 2015
  • Location: France & Canada
  • Posts: 393
  • Post Rating Ratio: +6/-0
    • Adriweb
    • @Adriweb
    • Adriweb
    • @UC-UIrbk5SuaUCvnV6SQvt-Q
    • /u/Adriweb
    • Adriweb
    • View Profile
    • TI-Planet.org
  • Gender: Male
Probably not, since it disables interrupts, and lib functions are interrupt-safe.
However, for programmers using ASM directly in their project and already manually handling interrupts, well... :)

(BTW grosged, Runer said push is 10 states, not 12)
  • Calculators owned: TI-Nspire CX CAS, TI-Nspire CX, TI-Nspire CAS (x3), TI-Nspire (x2), TI-Nspire CM-C CAS, TI-Nspire CAS+, TI-80, TI-82 Stats.fr, TI-82 Plus, TI-83 Plus, TI-83 Plus.fr USB, TI-84+, TI-84+ Pocket SE, TI-84+ C Silver Edition, TI-84 Plus CE, TI-89 Titanium, TI-86, TI-Voyage 200, TI-Collège Plus, TI-Collège Plus Solaire, 3 HP, some Casios
Co-founder & co-administrator of TI-Planet and Inspired-Lua

Offline DJ Omnimaga

  • Omni founder & CW co-founder
  • CodeWalrus Staff
  • Super User
  • Forum Maintenance
  • Original 5
  • CodeWalrus Supporter
  • *
  • Topic Management
  • Join Date: Nov 2014
  • Location: Quebec, Canada
  • Posts: 17468
  • Post Rating Ratio: +83/-4
    • dj_omnimaga
    • DJOmnimaga.music
    • @DJOmnimaga
    • dj_omnimaga
    • @DJOmnimaga
    • /u/DJ_Omnimaga
    • DJOmnimaga
    • 112/11286
    • @djomnimaga
    • @DJOmnimaga
    • View Profile
    • DJ Omnimaga music store
  • Gender: Male
Ah right, that could be an issue then >.<
  • Calculators owned: TI-73, TI-80 (broken), TI-81, TI-82, TI-83, TI-83+ (broken), TI-83+ (broken), TI-83+SE (broken), TI-84+, TI-84+CSE, TI-84+CE, TI-85, TI-86, TI-89T, TI-92, TI-Nspire, TI-Nspire CX, HP 39gII, HP Prime, Casio fx-7000G, fx-7400G+, fx-7700GE, fx-9750G+, fx-9750GII, fx-9860G, cfx-9850G, FX-1.0+, fx-CG10, fx-CP400
  • Consoles, mobile devices and vintage computers owned: Samsung i5510, Nexus 5, Atari 2600, Lynx, SMS, Game Gear, Genesis, Dreamcast, NES, SNES, N64, GCN, Wii, Wii U, GBA, DS, 3DS, PS2, PS3, PS4, PSP, PSVita, XBox 360, XBOne

Bandcamp|Reverbnation|Facebook|Youtube|Twitter
Retired Omnimaga admin (2001-11) and editor (2012-14)
??? ??? ??? ???

Offline ben_g

  • Full User
  • Safe-haven access
  • Join Date: Dec 2014
  • Location:
  • Posts: 159
  • Post Rating Ratio: +9/-0
    • View Profile
Actually, the fastest way ever would require 25600 bytes  :P

Do you mean something like this?
Code: [Select]
ClrVeryFast:
  ld hl, 0
  ld (plotsscreen), hl
  ld (plotsscreen+2), hl
  ld (plotsscreen+4), hl
  ld (plotsscreen+6), hl
  ld (plotsscreen+8), hl
  ld (plotsscreen+10), hl
  ...
  ld (plotsscreen+764), hl
  ld (plotsscreen+766), hl
  ret

Offline DJ Omnimaga

  • Omni founder & CW co-founder
  • CodeWalrus Staff
  • Super User
  • Forum Maintenance
  • Original 5
  • CodeWalrus Supporter
  • *
  • Topic Management
  • Join Date: Nov 2014
  • Location: Quebec, Canada
  • Posts: 17468
  • Post Rating Ratio: +83/-4
    • dj_omnimaga
    • DJOmnimaga.music
    • @DJOmnimaga
    • dj_omnimaga
    • @DJOmnimaga
    • /u/DJ_Omnimaga
    • DJOmnimaga
    • 112/11286
    • @djomnimaga
    • @DJOmnimaga
    • View Profile
    • DJ Omnimaga music store
  • Gender: Male
Wait, are loops actually this much slower in ASM too? O.O I thought that was just a TI-BASIC-specific flaw O.O
  • Calculators owned: TI-73, TI-80 (broken), TI-81, TI-82, TI-83, TI-83+ (broken), TI-83+ (broken), TI-83+SE (broken), TI-84+, TI-84+CSE, TI-84+CE, TI-85, TI-86, TI-89T, TI-92, TI-Nspire, TI-Nspire CX, HP 39gII, HP Prime, Casio fx-7000G, fx-7400G+, fx-7700GE, fx-9750G+, fx-9750GII, fx-9860G, cfx-9850G, FX-1.0+, fx-CG10, fx-CP400
  • Consoles, mobile devices and vintage computers owned: Samsung i5510, Nexus 5, Atari 2600, Lynx, SMS, Game Gear, Genesis, Dreamcast, NES, SNES, N64, GCN, Wii, Wii U, GBA, DS, 3DS, PS2, PS3, PS4, PSP, PSVita, XBox 360, XBOne

Bandcamp|Reverbnation|Facebook|Youtube|Twitter
Retired Omnimaga admin (2001-11) and editor (2012-14)
??? ??? ??? ???

Offline Streetwalrus

  • Professional slacker
  • CodeWalrus Staff
  • Super User
  • Server Maintenance
  • Original 5
  • Join Date: Nov 2014
  • Location: Israel
  • Posts: 2785
  • Post Rating Ratio: +19/-0
  • ƎW∀⅁ ƎH⊥
    • View Profile
  • Gender: Male
Loop unrolling is a common trick to gain speed at the cost of size since you spend less time decrementing, comparing and jumping.
  • Calculators owned: TI-80, HP 40G, TI-84 Plus rev G (yay 128k RAM), TI-83 Plus Silver Edition (broken LCD), TI-82 Stats.fr (black), TI-Nspire CX rev C (yay Nlaunchy), TI-83+ SE ViewScreen

Offline ben_g

  • Full User
  • Safe-haven access
  • Join Date: Dec 2014
  • Location:
  • Posts: 159
  • Post Rating Ratio: +9/-0
    • View Profile
Loops are not that slow in ASM, but loops cause overhead in every language. The speed difference may not even be noticable and in this case it's deffinately not worth the additional memory requirements, but it is technically faster.

Offline aeTIos

  • Dabbler in C
  • CodeWalrus Staff
  • Super User
  • Server Maintenance
  • Moderator
  • Original 5
  • Join Date: Nov 2014
  • Location: Ede, NL
  • Posts: 992
  • Post Rating Ratio: +12/-0
  • hi
    • r_vdijk
    • /u/aetios
    • aetios
    • View Profile
  • Gender: Male
Well, it doesn't have to jump every time and calculate which loop it is on. Instead everything is hardcoded.
ceci n'est pas une signature

Offline DJ Omnimaga

  • Omni founder & CW co-founder
  • CodeWalrus Staff
  • Super User
  • Forum Maintenance
  • Original 5
  • CodeWalrus Supporter
  • *
  • Topic Management
  • Join Date: Nov 2014
  • Location: Quebec, Canada
  • Posts: 17468
  • Post Rating Ratio: +83/-4
    • dj_omnimaga
    • DJOmnimaga.music
    • @DJOmnimaga
    • dj_omnimaga
    • @DJOmnimaga
    • /u/DJ_Omnimaga
    • DJOmnimaga
    • 112/11286
    • @djomnimaga
    • @DJOmnimaga
    • View Profile
    • DJ Omnimaga music store
  • Gender: Male
Ah I see. I just thought it was TI sucking <_<

This is why the 83+ version of GalagACE used 12 Output commands to draw 12 ships instead of two For loops and 1 Output command.
  • Calculators owned: TI-73, TI-80 (broken), TI-81, TI-82, TI-83, TI-83+ (broken), TI-83+ (broken), TI-83+SE (broken), TI-84+, TI-84+CSE, TI-84+CE, TI-85, TI-86, TI-89T, TI-92, TI-Nspire, TI-Nspire CX, HP 39gII, HP Prime, Casio fx-7000G, fx-7400G+, fx-7700GE, fx-9750G+, fx-9750GII, fx-9860G, cfx-9850G, FX-1.0+, fx-CG10, fx-CP400
  • Consoles, mobile devices and vintage computers owned: Samsung i5510, Nexus 5, Atari 2600, Lynx, SMS, Game Gear, Genesis, Dreamcast, NES, SNES, N64, GCN, Wii, Wii U, GBA, DS, 3DS, PS2, PS3, PS4, PSP, PSVita, XBox 360, XBOne

Bandcamp|Reverbnation|Facebook|Youtube|Twitter
Retired Omnimaga admin (2001-11) and editor (2012-14)
??? ??? ??? ???

Offline grosged

  • Full User
  • Join Date: May 2016
  • Location:
  • Posts: 63
  • Post Rating Ratio: +2/-0
    • grosged
    • View Profile
  • Gender: Male
Ah yes, Push does not take 12 but only 10 !! (I've checked)
Thanks, Adriweb...and Runer ;)

I also modified "PUSH IX/IY" which takes 14 states (not 16)
« Last Edit: June 11, 2016, 08:24:04 pm by grosged »

Offline tr1p1ea

  • Full User
  • Join Date: Feb 2015
  • Location:
  • Posts: 218
  • Post Rating Ratio: +4/-1
    • View Profile
Has this actually been timed on calc? The ez80 'sort of' has some pipelining features that could introduce some benefits for certain instruction combinations.

Offline grosged

  • Full User
  • Join Date: May 2016
  • Location:
  • Posts: 63
  • Post Rating Ratio: +2/-0
    • grosged
    • View Profile
  • Gender: Male
This morning, I've just manually measured both methods : "LDIR" and "PUSH"
I used http://online-stopwatch.chronme.com/ , my TI83PCE (freshly "Ram cleared", unplugged)

Here are the 2 programs to clear screen during 10 000 times !

First, the classic method "LDIR"...

Code: [Select]
        ld              a,$27
        ld              ($e30018),a

        ld              bc,10000
BigLp:  push    bc
;----------------------------------------------------------------
        ( di )
        ld              hl,$d40000
        ld              de,$d40001
        ld              (hl),85
        ld              bc,76799
        ldir
        ( ei )
;-----------------------------------------------------------------
        pop     bc
        dec     bc
        ld              a,b
        or              c
        jp              nz,BigLp

        ld              a,$2d
        ld              ($e30018),a
        ret

which takes (with or without interrupts!)  1 minute and 59 seconds



Then, the method "PUSH" ...

Code: [Select]
        ld              a,$27
        ld              ($e30018),a
       
        ld              bc,10000
BigLp:  push    bc
;-----------------------------------------------------------------------
        ld      de,$555555      ; will write byte 85 (= blue color)
        or      a
        sbc     hl,hl
        ld      b,213
        di
        add     hl,sp           ; saves SP in HL
        ld      sp,vram+76800   ; begin at end of 8bpp mode physical screen
ClrLp:  .fill 120,$d5           ;       = 120 * "PUSH DE"
        djnz    ClrLp           ; during 213 times
        .fill 40,$d5            ; 40 * "PUSH DE"
        ld      sp,hl           ; restore SP
        ei
;------------------------------------------------------------------------
        pop     bc
        dec     bc
        ld              a,b
        or              c
        jp              nz,BigLp

        ld              a,$2d
        ld              ($e30018),a
        ret

which takes ... 58 seconds !!!   ;D

And if we relocate the main routine in $e30800, time will decrease to 51 seconds !!!
« Last Edit: June 12, 2016, 08:24:51 am by grosged »

Offline aeTIos

  • Dabbler in C
  • CodeWalrus Staff
  • Super User
  • Server Maintenance
  • Moderator
  • Original 5
  • Join Date: Nov 2014
  • Location: Ede, NL
  • Posts: 992
  • Post Rating Ratio: +12/-0
  • hi
    • r_vdijk
    • /u/aetios
    • aetios
    • View Profile
  • Gender: Male
Wow, that's some impressive gain. Good job ;D
ceci n'est pas une signature

 


You can also use the following HTML or bulletin board code to share it on your page or forum signature!


Also do not forget to check our affiliates below.
Planet Casio TI-Planet Calc.news BroniesQC BosaikNet Velocity Games