Join us on Discord!
You can help CodeWalrus stay online by donating here.

[gLib][3d][z80][ez80] gLib a fast 3D asm/axiom library

Started by TheMachine02, January 19, 2015, 05:10:01 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

TheMachine02

Well texturing should be about twice as fast as alpha-screenshot. I plan about ~200TStates per pixel (and want less of course !) vs the 450TStates of alpha/pixel.

TheMachine02

#331
    So I decided to slipt the library in two part:
    One part will handle all the basic 3D need, plan to have a lightweight 'essential' library, with a new name : Iris3D
    The second part - who will still be named gLib - will try to provide a render pipeline (that is, much easier programming to the end user), as well as many utilities like bone/animation rendering and model handling.
    Of course, gLib actual rendering will be based on the more close to metal Iris3D functions.

    Planned features of Iris3D library :

    • triangle flat filling
       low cost setup
       left filling rule
       about 9000 triangles/s
       fill rate of about 6MPxl/s
    • advanced maths functions
       matrix
       quaternions
       vectors
       vm multiply (~30000/s)
       2d projection (~50000/s)
    • depth sort
       constant speed cost by triangle
    • Framebuffer
       doublebuffering
       vsync
       320x240 8bpp, maybe 160x120 16bpp
       8bpp : R3B2G3 palette
    • Texture
       power of two size
       maybe compression (1/2 ratio)
    • cliping
       3d cliping against the frustrum
       more advanced functions

As you can see, not everything as been implemented, but once it will, a solid base will be created for 3D on CE.


EDIT : so a few progress on texture mapping :



Texture gradient are for now hardcoded and a lot of thing isn't proprey implemented, but it is on good way.
It need 4 divisions for setup though, so about 2000 ticks. Inner loop is from 126 to 136 TStates, so it give a fillrate of about 350KPxl/s

TheMachine02

Sooooo this is technically a triple post, but anyway  :P



Texture now works (and there isn't hardcoded value, this is the full routine !). There is quite a lot of optimisation to do right now, but it less important. The maximum size of the texture is 127x255 which allow quite some stuff. (A 127x255 texture is 32385 bytes ). I now will focus on clipping and also in handling model seams correctly (as if, in a model, on vertex share different texture coordinate). Texturing isn't finished either, I want to implement a BC1 type texture compression (ratio of 1/2) with decompression during texel fetch.

As for speed, the inner texture loop is exactly 114 TStates, +10TStates when changing texture coordinate. This is a theorical 421000KPxl/s. Of course, this isn't reachable in pratice, but give a good indication of what is possible.

catastropher

Nice work! I can't wait to see this once it's done! I have two questions:

  • Is this doing affine texture mapping or perspective correct? X3D does affine, but at one point I had it doing the perspective correction every 16 pixels (just like Descent did)
  • Is this only for drawing triangles or can it handle other polygons as well? You can use the polyline property of convex polygons to split a polygon into two polylines and then draw each scanline by walking down the edges. This saves a lot of time because e.g. to draw a hexagon you only need one poly instead of 4.
A lot of people think you have to sort all the points in the polygon or something, but I just this method that I came up with (which runs in linear time):

[spoiler]
typedef struct X3D_PolyVertex {
    X3D_Vex2D v2d;
    int16 intensity;
    int32 u, v;
    int32 z;
} X3D_PolyVertex;

typedef struct X3D_PolyLine {
    uint16 total_v;
    X3D_PolyVertex** v;
} X3D_PolyLine;

_Bool x3d_polyline_split(X3D_PolyVertex* v, uint16 total_v, X3D_PolyLine* left, X3D_PolyLine* right) {
    // Force the polygon to be clockwise
    x3d_polyvertex_make_clockwise(v, total_v);
   
    int16 top_left = 0;
    int16 top_right = 0;
    int32 max_y = v[0].v2d.y;
   
    int16 i;
    // Find the top left point, the top right point, and the maximum y value
    for(i = 1; i < total_v; ++i) {
        if(v[i].v2d.y < v[top_left].v2d.y) {
            top_left = i;
            top_right = i;
        }
        else if(v[i].v2d.y == v[top_left].v2d.y) {
            if(v[i].v2d.x < v[top_left].v2d.x)    top_left = i;
            if(v[i].v2d.x > v[top_right].v2d.x)   top_right = i;
        }
       
        max_y = X3D_MAX(max_y, v[i].v2d.y);
    }
   
    left->total_v = 0;
    right->total_v = 0;
   
    // Grab the points for the left polyline
    do {
        left->v[left->total_v] = v + top_left;
        top_left = (top_left + 1 < total_v ? top_left + 1 : 0);
    } while(left->v[left->total_v++]->v2d.y != max_y);
   
    // Grab the points for the right polyline
    do {
        right->v[right->total_v] = v + top_right;
        top_right = (top_right != 0 ? top_right - 1 : total_v - 1);
    } while(right->v[right->total_v++]->v2d.y != max_y);
   
    return left->total_v > 1 && right->total_v > 1;
}
[/spoiler]
  • Calculators owned: TI-83+, TI-83+ SE, TI-84+ SE, TI-Nspire CX, TI-92+, TI-89 Titanium
Creator of X3D, a 3D portal rendering game engine for Nspire, 68k, and PC

TheMachine02

The routine definitly doesn't do any perpective correction for speed sake. The poor ez80 can't do divide on is own and it is about 500TStates. That would mean about 30TStates/pxl in more, but that is without counting the fact it would destroy many register that the routine would need to restore.
It does only draw triangle, as drawing polygons might be quite hard to get on pure ez80 asm. I'll look into your method, which look like quite interesting.

Dream of Omnimaga

Ideally, you should avoid splitting your lib too much, because people will get confused about what is Iris3D and what does gLib do, like with various other softwares. I'm glad you got textures working by the way. I can't wait for animated eye-candy. :)
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

TheMachine02

So, there is progress :



Quite a lot a textured poly drawn if you ask me. The uv texture coordinate is kinda mess up in some model due to the lack of texture repeat. Anyway, look good. I'll will try to speed this up to the speed of the light of course, but I don't promise anything. (Futhermore, the model is created to have two texture on it, and I doesn't implement such texture switch already)

As for split, well dunno. I can do thing as such I have a modulable environnement, in that case the library will be straight renamed as Iris3D on ez80 plateform (but still gLib on b&w). As such, the developper chose directly what does he want to use as function and only the used functions are included in final program.

catastropher

Nice work so far! :D I'm sure you know this, but I'm just throwing it out there. If you have a texture dimension that is a power of two (2^n), logical anding the u and v values with (2^n - 1) will give you texture wrapping. So your loop would look something this (this isn't real code, I just wrote it as an example):

[spoiler]
typedef struct {
    int x;
    fp16x16 u, v;   // 16.16 fixed point
    fp16x16 z;
} ScanlineValue;

typedef struct {
    unsigned char size;
    unsigned char* texels;
} Texture;

void draw_scanline(ScanlineValue* left, ScanlineValue* right, Texture* tex, int y, short* zbuf) {
    int dx = (right->x - left->x != 0 ? right->x - left->x : 1);
   
    fp16x16 du = (right->u - left->u) / dx;
    fp16x16 dv = (right->v - left->v) / dx;
    fp16x16 dz = (right->z - left->z) / dx;
   
    fp16x16 u = left->u;
    fp16x16 v = left->v;
    fp16x16 z = left->z;
   
    for(int x = left->x; x <= right->x; ++x) {
        int zz = z >> 16;
       
        if(zz < zbuf[y * SCREEN_WIDTH + x]) {
            int uu = (u >> 16) & (tex->size - 1);
            int vv = (v >> 16) & (tex->size - 1);
           
            screen[y * SCREEN_WIDTH + x] = tex->texels[vv * tex->size + uu];
            zbuf[y * SCREEN_WIDTH + x] = zz;
        }
       
        u += du;
        v += dv;
        z += dz;
    }
}

[/spoiler]
  • Calculators owned: TI-83+, TI-83+ SE, TI-84+ SE, TI-Nspire CX, TI-92+, TI-89 Titanium
Creator of X3D, a 3D portal rendering game engine for Nspire, 68k, and PC

TheMachine02

#338
Well the issue isn't that I don't know how to do wrapping, it is because doing such will be way to slow  :P
I don't recalculate the real texture adress in the inner pxl loop, I barely offset it. As such, it is quite faster. (But there isn't wrapping). Actually, the only triangles which would pose an issue is the triangle starting at one side of the texture and finishing at the other ... but there isn't much triangle doing so.

EDIT : I might have fixed texture coordinate bug from my converter. This is a part of the model of ashe from ff12. (about 1300 triangles)


Snektron

  • Calculators owned: TI-84+
Legends say if you spam more than DJ Omnimaga, you will become a walrus...


TheMachine02

Well, pixel shader won't be implemented anytime soon, because a call is really slow  :P Anyway, I could do vertex/geometry shader.

Dream of Omnimaga

I am impressed at the speed those textured models run at. O.O
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

TheMachine02

#342
Quote from: DJ Omnimaga on September 27, 2016, 04:07:16 PM
I am impressed at the speed those textured models run at. O.O

Yeah, ez80 still pack some punch. I wish they'd put 1wait state ram everywhere though >_>. I hope however to make it run faster by creating lut for some of setup operation, and with general optimisation (and there is a lot to be found in this routine !).  Also keep in mind there is no perpective correction, so texture could appears somethime completly off.

Anyway, chocobo are cute <3


EDIT : dark shiva from ff10 isn't bad either. This is about 2500 triangles  O.O (much less displayed though, with bfc)

catastropher

When I was writing the texture mapper for X3D, I figured out a way to avoid doing the texels[v * tex_w + u] calculation, or any sort of complex logic to update an internal pointer. I just pre-multiply v by the width of the texture and make sure that only multiples of tex_w are counted (since it's interpolated, you could end up with e.g. v = 2.5 * tex_w, which would give you two and a half scanlines which is wrong) with v by doing a bitmask:

// Texture size of 64 -> 6 bits
const int TEX_BITS = 6;

// Fixed point precision for u and v
const int FRAC_BITS = 10;

fp du = ((right->u - left->u) << FRAC_BITS) / dx;
fp u = left->u << FRAC_BITS;

// Premultiply v by the width of the texture (make sure the type of v is big enough!)
fp dv = ((right->v - left->v) << (FRAC_BITS + TEX_BITS)) / dx;
fp v = left->v << (FRAC_BITS + TEX_BITS);

...

// Mask because only whole multiples of tex_w are meaningful (i.e. whole scanlines of texels)
const int v_mask = ((1 << TEX_BITS) - 1) << (FRAC_BITS + TEX_BITS);

for(int i = left->x; i <= right->x; ++i) {
    int index = ((v & v_mask) + u) >> FRAC_BITS;
   
    *screen_ptr = texels[index];
   
    ++screen_ptr;
    u += du;
    v += dv;
    ...
}


Using a more complex mask, this can also be used for super fast texture wrapping. Let me know if you're interested and I can give you more details!
  • Calculators owned: TI-83+, TI-83+ SE, TI-84+ SE, TI-Nspire CX, TI-92+, TI-89 Titanium
Creator of X3D, a 3D portal rendering game engine for Nspire, 68k, and PC

TheMachine02

#344
Well, I need to see how I can convert this method to asm  :P I use a slighty different method currently : I add int(du)+tex_size*int(dv) and update a fractionnal part with frac(du)*65536+frac(dv). As such, when the register containing the fractional part produc a carry, I offset the texture coordinate by one, and if a bit is rolled into the mid register (ie, High register), it is offseted by tex_size.
I use this method because the ez80 is quite limited on register and it allow me to do not reload gradient into the register each loop.

EDIT :




Powered by EzPortal