Yes, source is included. It's even annotated to some extend, but overall this one is rather tricky to understand. Basically what happens here is this: There are some specially crafted samples to create the waveforms. Each sample is 256 bytes, with each byte representing an output state (on or off). Groups of 4 bytes are combined and interpreted as a volume level by the player. Consequently, there are 5 OUT commands per channel (4 levels + silence). So you can think of the samples as a sort of PCM format, but instead of packing 2⁸ volume levels in one byte or 2¹⁶ in a word, you pack 5 volume levels in 4 bytes. This is done so the player can read the sample data as fast as possible.
Mind you, the engine was optimized for a 3,5 MHz machine and is now running in 6 MHz, so there is room for improvements. Especially if you'd think about targetting 15 MHz.