Join us on Discord!
You can help CodeWalrus stay online by donating here.

[gLib][3d][z80][ez80] gLib a fast 3D asm/axiom library

Started by TheMachine02, January 19, 2015, 05:10:01 PM

Previous topic - Next topic

0 Members and 4 Guests are viewing this topic.

Snektron

Quote from: TheMachine02 on January 20, 2015, 10:16:16 AM
Matrix calculation is also made with integer, 7bit integer to be more precise. [-64,64] and precision is ok, as well as speed :



stress test with 256 points, 6fps at 6MHz (so for matrix rotation : 2304 fastmul per frame, and 256 div, 512 mul)
this test is without using fast math  :)

While we're on the point-cloud off topic:

Cheaty isometric 3d in Axe, 1024 points ^^ (it's probably faster in asm).
  • Calculators owned: TI-84+
Legends say if you spam more than DJ Omnimaga, you will become a walrus...


Dream of Omnimaga

Woah, that looks cool actually! It reminds me of water drop effects. It's slow but is it 6 MHz like the other screenshot? O.O
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

Snektron

Yeah, but it's not much faster in full.
In ASM it could reach much more power, since
you only need a few registers. Ill look into that
In the near future :)
  • Calculators owned: TI-84+
Legends say if you spam more than DJ Omnimaga, you will become a walrus...


TheMachine02

So, I was writing my primitive pipeline, when I had a LOT of flashes.
First, I've got idea for a new pipeline in vertex processing, however I simply don't know wich performance I can expect from it, and if it more efficient than my current pipeline.
I analized my pipeline, and find that my current caching system (aka VBO) is simply ... slow  :P

A new caching system worth to be designed, and I got an really interesting idea, but wich use lot of RAM : instead of having a VBO cache wich size is 16*NB_VERTEX+2 and wich is unaligned, I could do a cache of size 9*256 (2304 bytes), fixed size, and aligned
With this, I can acess the cache as interleaved array, and save +-15% of the previous cache fetch cycles.
BUT, the main drawback is the fixed size (VBO is a varyng size), and this also make my pseudo VBO totally unusefull : it need code rewriting.

As exemple, expected code for fetch:

_gVertexFetch:
ld h, gCache/256
ld a, (hl)
ld (gClipCode), a
inc h
ld e, (hl)
inc h
ld d, (hl)
inc h
ld (gPositionX), de
ld e, (hl)
inc h
ld d, (hl)
inc h
ld (gPositionY), de
ld a, (hl)
inc h
ld h, (hl)
ld l, a
ld (gPositionZ), hl

at 153 cycles

What do you think of all of this ? Should I try this pipeline, wich may totally change the using syntax of the core? (I know I don't already do the tuto, so it is not really to bad, but bleah) and does the cache system should be implemented this way?

Dream of Omnimaga

I unfortunately don't know assembly so I am unsure what you mean by pipeline. Could you enlighten me about the benefits and drawbacks this would have in your program?
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

Snektron

Ah i think i get what he's trying to say. Instead of calculating the address of a vertex by adding up the
size of every vertex before it, have a fixed size for every element. (it's very hard for some of us to understand
since we don't do any ASM and/or 3D pipeline writing :P
  • Calculators owned: TI-84+
Legends say if you spam more than DJ Omnimaga, you will become a walrus...


TheMachine02

 :P

A pipeline is the command flow of the program, it just tell wich action to do in order, like:

-rotate vertex -> project vertex -> render point.

The current system I used, have an indexed cache, storing the rotated vertex (in order to not re-rotate them if needed), acessing data this way :
Index_vertex*16+BaseAdress, this command giving adress of the location of the vertex, baseadress point to the really first element.

The new cache I designed doesn't acess data this way, but acess them way faster. However, there is drawback, and I liste them below:
*OLD :
-indexing is slow              ( --)
-flexibilty in size (user defined) (defined by the number of the vertex user input)  (++)
-theorical infinity of vertex (++)

*NEW:
-indexing is WAY faster (14% for vertex, 70% for x,y coordinate speed boost from previous routine) (++)
-fixed size of 7*256+512 bytes  (-, as a simple program will need lot of free RAM to be run)
-theorical 256 vertex per cache, but multiple cache can be used. (+, not really pratical)




Dream of Omnimaga

Ah ok thanks for clarifying. I think RAM should not be an issue now that most people use ZStart and Doors CS7, unless it requires as much RAM as Gemini or something. For the cache size I guess it depends how large maps can be (eg a game where every map takes 1 second to go through might not be as fun)
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

TheMachine02

Well fixed cache can have at max 256 vertices, so allow pretty much complex map, and is smaller than 256 vetrtex with VBO.

Dream of Omnimaga

Aah ok. How many vertrex would a cube contain, for example?
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

TheMachine02

A cube contains 8 vertices. Btw, 256 is the limit of the nostromo's world if I remember well.

Snektron

  • Calculators owned: TI-84+
Legends say if you spam more than DJ Omnimaga, you will become a walrus...


Dream of Omnimaga

Thanks. It seems like it could definitively handle medium to large indoor maps. :) Maybe someone could make a 3D clone of Illusiat 3? :P
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

TheMachine02

Soo.. I was implementing new function (testing my new pipeline), and I got stopped by the name  <_<

Like : should I name the function :
-gGenVertexArray() and gVertexAttrib() wich is really big (and with the large font doesn't really look good on the screen)
something smaller ?
-gGenVxArray() and gVxAttrib() but I feel that those two "lose" the understandness of the two other...
Basically :
-gGenVertexArray() create a vertex array ... :p or return the adress of an existing one.
gVertexAttrib add attribute (color, normal...) to the current vertex array.

Anyone got a little idea on that ?  :P

Snektron

Maybe change "Vertex" with "Vtx" instead of "Vx"?
  • Calculators owned: TI-84+
Legends say if you spam more than DJ Omnimaga, you will become a walrus...


Powered by EzPortal