Join us on Discord!
You can help CodeWalrus stay online by donating here.

[gLib][3d][z80][ez80] gLib a fast 3D asm/axiom library

Started by TheMachine02, January 19, 2015, 05:10:01 PM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

Ti64CLi++

@DJ Omnimaga Thats right. An user changed my nicknamme. But if I want to connect me, I should use neuronix :(
  • Calculators owned: TI-Nspire CX CAS, TI-Nspire CX, TI-Nspire, TI-Nspire CAS TouchPad, TI-Nspire CAS, TI-Nspire, TI-Voyage 200, TI-92 Plus, TI-89 Titanium, TI-89, TI-83 Premium CE, TI-84 Plus CE, TI-82 Advanced, TI-84 Pocket.fr, TI-84 Plus Silver Edition, TI-84 Plus, TI-83 Plus Silver Edition, TI-83 Plus.fr USB, TI-83 Plus.fr, TI-83 Plus, TI-83 Plus, TI-83, TI-82 Stats.fr, TI-76.fr, TI-36X Pro, TI-Collège Plus Solaire, TI-Collège Plus, TI-30X Pro MultiView, TI-30XS MultiView, TI-30XB MultiView, TI-30 XB MultiView
Administrateur de Tout 82
Sur TI Planet depuis Août 2014, rédacteur depuis Août 2015Donnez moi un Internet : c'est gratuit et ne prends pas beaucoup de tempsAdministrateur de Life Game World

Dream of Omnimaga

@TheMachine02 Wow that transparency effect looks amazing! O.O

@Ti64CLi++ Would you like your login name to be changed as well?
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

TheMachine02

-Activity report-

Soooooo, let's start with alpha blended triangle :





I does look good, but is very slow due to the need of doing a linear interpolation per pixel (452 TStates !)(using MB register, 422TStates) (if somemone want to do better, well I won't say no :P ). Usefull for some light effect/transparency like windows or reflects, but don't use it too much in your scene. Pretty sure it can be optimized way further, I'll look into that. Hopefully, depth sorting have a good side effect here :D
Another application : ghhhooooossstttt >:D



Second thing is a ICopyTexImage2D command :


It is a must for billboard. It isn't very much optimized though, and can be made better.

Some new models:
Suzanne:



Walrii:



New color (converter fix') :



At last but not least, edges antialiasing. Look good, altough it is costly (the implementation force the engine to recalculate a lot of thing, antialiased triangle routine would be better) (it is kinda a post process right now, also explaining odd pixels).
Screeny are 'theorically' x64 AA but 8 bits color haven't enough precision to display full filtering potential. The filling is almost x4.5 slower. Not sure a limited x2 or x4 aa would be faster, but it worth a try.





The triangle rasterisation routine now also implement left filling convention, using the center of pxl as on/off, which turn out to have much higher raster quality at the cost of a slower triangle setup (about 200 TStates). With other optimisation made, the speed is almost the same as before (for average size triangle, model showed have a lot of reeeeeaallllly small triangles, which put an higher pressure on setup), and inner loop is faster. Screeny :





Next big steep in quality, texture :P (kinda crazy hard to get them working, but I'll figure out one day).

I started to make internal memory managment better, in futur of a release, as well as a .h file for a C linking. Currently library use a 512 bytes buffer, 512 bytes of data included in program, and few variables. There is also triangle buffer, 6 bytes per triangles, and a vertex buffer, which will most likely get some refactoring.

I am targeting an alpha release at the end of the summer, with all majors routines coded. Maybe I'll do a game too if I have time.

About the maximum the engine can handle, it is more a question of RAM buffer : for now the vertex buffer and the triangle buffer are fixed RAM area. The 2500vx/3500tri model use about 41000 bytes of RAM. I intend to lower the RAM cost, altough I am not sure it will be the case. At least, the model will be compressed (Lz77 most likely) to lower memory cost.

Planed max are :
IMAX_TRIANGLE=4096
IMAX_VERTEX=2048
Which should be enough for a game. Anyway, perf would be crappy with more.
For z80 version, max are :
gMAX_TRIANGLE=256
gMAX_VERTEX=256
That would already take about ~5000bytes of RAM and I don't want to take all RAM left to the user :P (futhermore, 256 vertices transformed is almost 1/6e of CPU time at 6MHz)

Cliping start to made his way into the library. Culling example :



   The engine actually compute outcode for plane but they aren't used yet to do *actual* clipping. Anayway, clipping is tricky because the pipeline need to consider generation of new geometry piece ; and since the actual pipeline consider the whole scene as one batch to sort and render (as such multiple source is possible, just ouputed to a common batch), clipping should theorically happen before depth sorting. However, I fear it might be too hard for the engine to follow. At 4096 triangles max, if all need clipping, imagine what RAM it will be using and which data it will need to follow. The second method is to do cliping just before actual render, after depth sorting. The drawback is that culled triangles will be sorted along the in-frustrum triangles, altough I could make a quick calculation when computing depth key of the triangle to see if it is behind the z=0 plane. Of course triangle behind other plane will be sorted, but it will still remove a good part of unecessary geometry. The other problem of this method is that some polygons will have an odd z-average depth. I could add up a constant z-depth (ie, 8388608) as such that negative z are now positive (and handled flawlessly by the sorting algorithm). (The depth allowed range would be -8388608,8388608).

   Anyway, the good part of all of this is that instead of computing on the fly a plane code, it is fused in the projection. Only add about ~50TStates and automatically stop the projection and switch to clip if point is out (out point doesn't have to be projected). The only thing is that if the point is out on X axis only, the switch need to be delayed to second part of the projection algorithm, but it is only about 200TStates (which is still faster than the speed of a detached compute code routine). You can't imagine how this pipeline give me headacke <_< . The gLib pipeline was much simpler due to the limited geometry handling and limited features. Maybe a upgrade to this new in-formation pipeline could do great thing too. I must say that the new vertex id storage is quite interesting, since it store directly vbo adress, instead of a relative (0,1,2,3..) id. Switch to data adress if quite fast and is only necessary one time, whereas the vbo adress is alway used. It remove all adressing part which was an huge bottleneck of previous pipeline. Well, I want to run the (low poly) chocobo model on the z80 (at 6MHz of course) so there is quite some work left :P

Anyyyyyywayyyy, let's stop ranting.

Todo list:
-pipeline&clipping
-texture
-high level commands (bone, model...)
-port of sorting and triangle filling to z80 (to axiom form, for version 4.0 beta) (port of the projection algorithm used in ez80 version to z80 blew up my already fast projection routine (smaller, faster, more precise), putting a cube at over >54 fps , ~50fps average.) (let's go to 60')
-start writing a doc'
For later version :
-proper material support (?)
-proper lightning support

-EOF-

Dream of Omnimaga

Nice to see you again. I like the progress and hopefully you might be able to make the anti-aliasing faster, because it looks friggin amazing. Also that :walrii: is fast :D

Would the 2D image scaling thing be used for textures and 2D sprites like in Doom?
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

TheMachine02

Quote from: DJ Omnimaga on August 31, 2016, 01:43:04 AM
Would the 2D image scaling thing be used for textures and 2D sprites like in Doom ?

Yep, it is the objective.

As for AA, it is quite hard to get it running at a good speed. I am currently trying a x2 only AA (which increase quality netherless) but is fast as it only use 50% blending. (and there isn't a lot of thing to track).

Snektron

Looks pretty impressive. I hope it being used in actual games. Though they might need a lot more triangles and stuff.
  • Calculators owned: TI-84+
Legends say if you spam more than DJ Omnimaga, you will become a walrus...


ben_g

Quote from: TheMachine02 on September 01, 2016, 08:34:47 AM
Quote from: DJ Omnimaga on August 31, 2016, 01:43:04 AM
Would the 2D image scaling thing be used for textures and 2D sprites like in Doom ?

Yep, it is the objective.

As for AA, it is quite hard to get it running at a good speed. I am currently trying a x2 only AA (which increase quality netherless) but is fast as it only use 50% blending. (and there isn't a lot of thing to track).

Maybe you could look into FXAA which is much faster than MSAA (at least on computers, it may be bad on calculators because they lack shading hardware). But since calculators generally handle low-poly environments, an AA triangle drawing routine would probably be best for both quality and speed.

For in games, I think that it's best to drop AA and instead use that processing power for more triangles and/or textures. AA makes renders look very pretty, but I think games are better off with more detailed levels rather than with AA'd rendering. AA is still a nice feature for stuff like model viewers though, since framerate and responsiveness is not  that important there.

JosJuice

Wow, that GIF with the anti-aliased chocobo... For a second, I almost forgot that it was running on a calculator O.O
  • Calculators owned: TI-84+ SE, Casio fx-CG10

Dream of Omnimaga

Yeah I know the feeling JosJuice :P

As for anti-aliasing it isn't necessary, but it can be handy for screenshots. Maybe you could make it so we can render one anti-aliased frame by pressing a specific key when AA is disabled?
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

Ti64CLi++

  • Calculators owned: TI-Nspire CX CAS, TI-Nspire CX, TI-Nspire, TI-Nspire CAS TouchPad, TI-Nspire CAS, TI-Nspire, TI-Voyage 200, TI-92 Plus, TI-89 Titanium, TI-89, TI-83 Premium CE, TI-84 Plus CE, TI-82 Advanced, TI-84 Pocket.fr, TI-84 Plus Silver Edition, TI-84 Plus, TI-83 Plus Silver Edition, TI-83 Plus.fr USB, TI-83 Plus.fr, TI-83 Plus, TI-83 Plus, TI-83, TI-82 Stats.fr, TI-76.fr, TI-36X Pro, TI-Collège Plus Solaire, TI-Collège Plus, TI-30X Pro MultiView, TI-30XS MultiView, TI-30XB MultiView, TI-30 XB MultiView
Administrateur de Tout 82
Sur TI Planet depuis Août 2014, rédacteur depuis Août 2015Donnez moi un Internet : c'est gratuit et ne prends pas beaucoup de tempsAdministrateur de Life Game World

Dream of Omnimaga

  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

novenary

#326
Quote from: DJ Omnimaga on September 07, 2016, 04:24:49 PM
Ok. @Streetwaljuju@poke.
Care to uh, brief me up ? :P

Edit: nvm, got it. Your password is gonna be reset, you'll receive an email about it.

Dream of Omnimaga

Yeah I hate how SMF resets the password. I think there's a way to do it without reseting it but I don't know if it's safe. Also above I tried posting mentions without the @ :P
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

TheMachine02

Quote from: ben_g on September 02, 2016, 12:24:41 PM
Maybe you could look into FXAA which is much faster than MSAA (at least on computers, it may be bad on calculators because they lack shading hardware). But since calculators generally handle low-poly environments, an AA triangle drawing routine would probably be best for both quality and speed.

Well the current AA method in not even msaa, but some sort of analytical aa. As for FXAA first it is a depth aware filter (and I don't have a z-buffer),  and it need to search through all pixel (76800), so it is at least 2M TStates. And that is a lot  :P

Anyway, the next thing is definitly textures.

Dream of Omnimaga

Will textures be any fast enough to be useful, though?  O.O

Good luck :)
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

Powered by EzPortal