Quote from: Vogtinator on July 11, 2017, 06:17:22 PM
Looks impressive!
Now just textures, shadows, anti-aliasing and shaders are missing
Thanks!
At least half of those are on the way! Guess which ones?
Quote from: Vogtinator on July 11, 2017, 06:17:22 PM
Out of curiosity, did you try it on hardware as well? I noticed that there are some unexpected bottlenecks there, especially memory speed (simple blit is already much slower).
I did indeed try it on the actual hardware using different screen update methods. Though first, I noticed in the nSDL repo that nsp_palette is a global variable:
static Uint16 nsp_palette[256] = {0};
I also noticed that the code to convert the screen to 16 bit color looks like this:
/* 8 bpp SW, 16 bpp HW */
NSP_DRAW_LOOP(
for ( j = 0, k = 0; k < row_bytes; j += 2, ++k )
*(Uint16 *)(dst_addr + j) = nsp_palette[src_addr[k]];
);
I noticed that every iteration of the loop is reading from nsp_palette, which is a global variable. Is the compiler smart enough to put the address of nsp_palette into a register? Otherwise, it may keep loading the absolute address of nsp_palette each time (or at least each row). If this is the case, it may help to put the address into a local variable.
I did some testing of different methods for displaying an 8-bit buffer to the screen. The first two assume the calculator is in 16-bit color mode, the last one assumes 8-bit color. To do so, I disabled all rendering in my engine (except for showing the frame rate). Here's version 1, which is similar to the code in nSDL:
unsigned int sdlColors[256]; // Color palette
unsigned char* pixel = screen->pixels;
unsigned char* pixelEnd = screen->pixels + 320 * 240;
unsigned short* pixelDest = REAL_SCREEN_BASE_ADDRESS;
do
{
*pixelDest++ = sdlColors[*pixel++];
} while(pixel < pixelEnd);
Using this method, I got 27 FPS (without rendering anything, just updating the screen). However, by changing the code to pack two 16-bit pixels into an int, I got 41 fps:
unsigned int sdlColors[256]; // Color palette
unsigned char* pixel = screen->pixels;
unsigned char* pixelEnd = screen->pixels + 320 * 240;
unsigned int* pixelDest = REAL_SCREEN_BASE_ADDRESS;
do
{
*pixelDest++ = (sdlColors[pixel[0]]) + (sdlColors[pixel[1]] << 16);
pixel += 2;
} while(pixel < pixelEnd);
Even better, if I set the calculator to be in 8-bit mode and just use a direct memcpy, I get 66 FPS:
memcpy(REAL_SCREEN_BASE_ADDRESS, screen->pixels, 320 * 240);
The only problem using this method is that I can't change the palette colors and it probably wouldn't work for 240*320 screen. Speaking of which, I may need to setup X3D to render sideways for those calcs XD Internally, I store all of my textures as 8-bit pixels because 1. that's what Quake did and 2. I can fit twice as many pixels into cache (which is a pathetic 4KB). As for current rendering speed, the average scene renders at ~15 FPS using method 2, and 25 FPS using method 3. Of course, I haven't put that much effort into optimizing it yet.
Quote from: Vogtinator on July 11, 2017, 06:17:22 PM
I could add a new bool lcd_set_prop(enum type, union value)
syscall for some more flexibility though and extend the definition of SCR_320x240_8 to mean 8 bit with palette. This would of course require a yet unreleased version of ndless_resources.
What do you think?
That would be awesome! Would it also be possible to provide a way to change the palette colors in hardware? Also, would it be possible to read the palette colors so we can restore them when the program exits?
Thank you so much! :3