============================================================================= Anomie's S-DSP Doc - with updates/fixes/clarifications/etc by jwdonal $Revision: 1212 $ $Date: 2015-09-28 00:41:22 -0700 (Mon, 28 Sep 2015) $ ============================================================================= The S-DSP is the actual sound generator for the SNES. It shares 64K of RAM with the SPC700, and can be poked at via SPC700 registers $00F2 and $00F3. It has an input clock running nominally at 24576000 Hz, and supplies the SPC700 1024000 Hz clock and the 8192000(?) Hz clock to the expansion port. Note that this clock has been indirectly observed to vary, with rates of anywhere from 24592800 Hz to 24645600 Hz. All SPC700 RAM access goes through the S-DSP, and the registers and IPL ROM may well be located there as well. The S-DSP takes two memory accesses in between each SPC700 memory access; this means none of the S-DSP external operations described below can occur at the same time as a SPC700 operation. The S-DSP's internal clock rate is 3.072MHz. Given a 32kHz sample rate this means that the S-DSP has exactly 96 clock cycles to generate each new stereo sample. Credit to libopenspc, via SNEeSe and snes9x, for much of the below information. I've since re-verified and re-interpreted much of it. Also, many thanks to blargg for most of the timing data and several more algorithms, and just incredible amounts of research. A note on terminology: "Clip" in this document refers to bit truncation, while "clamp" refers to range restriction. In C, these could be done as (note that these equations assume that 'v' is a 32-bit signed int): v = v & 0x7FFF; /* clip to 15 bits unsigned */ v = (v<0) ? 0 : ((v>0x7FFF) ? 0x7FFF : v); /* clamp to 15 bits unsigned */ SOUND GENERATION ================ The S-DSP can mix and output up to 8 voices to produce stereo sound (later on, sound generated by a device on the cart or the expansion port device may also be mixed in, but the S-DSP has no knowledge of or control over this). Output is nominally at 32000 Hz, realistically once every 768 clock cycles (32 SPC700 cycles). At a high level, each voice generates a stereo sample: * BRR data is decoded (15-bit mono sample). * Interpolation is performed over 4 BRR samples to determine the output sample, or the noise sample is selected (15-bit mono). * Apply the volume envelope (15-bit mono sample). * Apply the VxVOL registers (16-bit stereo sample). In addition, the left and right echo buffers together generate a 15-bit stereo sample which is passed through the left and right channel FIR filters resulting in a 16-bit stereo sample. These 9 samples are used in two ways: 1. Stereo Main Output * Mix all voices in order, clamping to 16 bits after each addition (16-bit stereo sample). * Adjust the main output by the MVOL registers to get the main sample (16-bit stereo sample). * Adjust the FIR sample by the EVOL registers (16-bit stereo sample). * Mix the EVOL-adjusted FIR sample into the main output, and clamp to 16 bits (16-bit stereo sample). * Output to the DAC (16-bit stereo sample) unless FLG bit 6 (MUTE) applies. 2. Stereo Echo Output * Mix all voices selected in EON in order, clamping to 16 bits after each addition (16-bit stereo sample). * Adjust the FIR sample by EFB (16-bit stereo sample). * Mix the EFB-adjusted FIR sample back into the echo output, and clamp to 16 bits (16-bit stereo sample). * Write to the echo buffer (15-bit stereo sample, left-aligned in 16 bits) unless FLG bit 5 (ECENx) applies. In all cases, convert from 15- to 16-bits by adding a 0 bit on the low end, and from 16- to 15-bits by dropping the low bit. More specifically, the registers and memory are accessed as follows. Note that most register values are read once per sample output and cached internally for use as needed. Note also that the S-DSP may perform some of the "if necessary" operations unconditionally but only make use of the result "if necessary". For example, in voice processing step S2 it may load the sample pointer unconditionally, but this has no effect unless there was a loop or KON. Each voice carries out the following operations: S1. Load VxSRCN register, if necessary. S2. Load the sample pointer (using previously loaded DIR and VxSRCN) if necessary. Load VxPITCHL register. Load VxADSR1 register. S3. a. Load VxPITCHH register. Apply pitch modulation if applicable. b. Load the BRR header byte (every time), and the first of the two BRR bytes that will be decoded. c. If applicable, replace the current sample with the noise sample. Apply the volume envelope. - This is the value used for modulating the next voice's pitch, if applicable. Check FLG bit 7 (NOT previously loaded). Check BRR header 'e' and 'l' bits to determine if the voice ends. Handle KOFF and KON using previously loaded values. If KON, ENDX.x will be cleared in step S7. Load VxGAIN or VxADSR2 register depending on ADSR1.7. Update the volume envelope, using previously loaded values. S4. Load and apply VxVOLL register. If a new group of BRR samples is required, load the second BRR byte and decode the group of 4 BRR samples. This is definitely not done when not necessary. If necessary, adjust the BRR pointer to the next block, or flag the loop address for loading next step S2 and set ENDX.x in step S7. Note that this setting of ENDX.x will not override the clearing due to KON in step S3c, if both occur during the same sample. Increment interpolation sample position as specified by pitch values. At any point from now until we next get to S3c, the next sample may be calculated using the interpolation position and BRR buffer contents. S5. Load and apply VxVOLR register. The new ENDX.x value is prepared, and can be overwritten. Reads will not see it yet. S6. The new VxOUTX value is prepared, and can be overwritten. Reads will not see it yet. S7. The new ENDX.x value may now be read. The new VxENVX value is prepared, and can be overwritten. Reads will not see it yet. S8. The new VxOUTX value may now be read. S9. The new VxENVX value may now be read. The full sample generation loop is as follows. Note how the above voice process is interleaved for the 8 voices. The choice of which cycle to call "cycle 0" is semi-arbitrary. I've included the standard timing of the SPC700 timer ticks, but note that frobbing the SPC700 TEST register can change this syncronization. 0. Voice steps: V0:S5 V1:S2 Tick the SPC700 Stage 1 timers, always for T2 and every 4 samples for T0 and T1. 1. Voice steps: V0:S6 V1:S3 2. Voice steps: V0:S7 V1:S4 V3:S1 3. Voice steps: V0:S8 V1:S5 V2:S2 4. Voice steps: V0:S9 V1:S6 V2:S3 5. Voice steps: V1:S7 V2:S4 V4:S1 6. Voice steps: V1:S8 V2:S5 V3:S2 7. Voice steps: V1:S9 V2:S6 V3:S3 8. Voice steps: V2:S7 V3:S4 V5:S1 9. Voice steps: V2:S8 V3:S5 V4:S2 10. Voice steps: V2:S9 V3:S6 V4:S3 11. Voice steps: V3:S7 V4:S4 V6:S1 12. Voice steps: V3:S8 V4:S5 V5:S2 13. Voice steps: V3:S9 V4:S6 V5:S3 14. Voice steps: V4:S7 V5:S4 V7:S1 15. Voice steps: V4:S8 V5:S5 V6:S2 16. Voice steps: V4:S9 V5:S6 V6:S3 Tick the SPC700 Stage 1 timer for T2. 17. Voice steps: V0:S1 V5:S7 V6:S4 18. Voice steps: V5:S8 V6:S5 V7:S2 19. Voice steps: V5:S9 V6:S6 V7:S3 20. Voice steps: V1:S1 V6:S7 V7:S4 21. Voice steps: V0:S2 V6:S8 V7:S5 22. Voice steps: V0:S3a V6:S9 V7:S6 Apply ESA using the previously loaded value along with the previously calculated echo offset to calculate new echo pointer. Load left channel sample from the echo buffer. Load FFC0. 23. Voice steps: V7:S7 Load right channel sample from the echo buffer. Load FFC1 and FFC2. 24. Voice steps: V7:S8 Load FFC3, FFC4, and FFC5. 25. Voice steps: V0:S3b V7:S9 Load FFC6 and FFC7. 26. Load and apply MVOLL. Load and apply EVOLL. Output the left sample to the DAC. Load and apply EFB. 27. Load and apply MVOLR. Load and apply EVOLR. Output the right sample to the DAC. Load PMON 28. Load NON, EON, and DIR. Load FLG bit 5 (ECENx) for application to the left channel. 29. Update global counter. Write left channel sample to the echo buffer, if allowed by ECENx. Load EDL - if the current echo offset is 0, apply EDL. Load ESA for future use. Load FLG bit 5 (ECENx) again for application to the right channel. ** Clear internal KON bits for any channels keyed on in the previous 2 samples. 30. Voice steps: V0:S3c Write right channel sample to the echo buffer, if allowed by ECENx. Increment the echo offset, and set to 0 if it exceeds the buffer length. Load FLG bits 0-4 and update noise sample if necessary. ** Load KOFF and internal KON. 31. Voice steps: V0:S4 V2:S1 ** These two steps (KON and KOFF related) are performed every other sample. Note that the internal KON bits are not cleared until 63 cycles after they are loaded. You could also consider the above loop to run from 0-63, with everything except these two steps repeated at T+32. Unless the SPC700 TEST register is frobbed, it is always the case that the KON/KOFF poll happens either 30 & 94 or 62 & 126 cycles after the SPC700 timer T0 and T1 tick. On power on, 62 & 126 seems to be chosen more frequently but 30 & 94 can still be chosen sometimes. On reset, either can be chosen. COUNTERS ======== The S-DSP has a global counter, which is examined by the noise sample generator and the volume envelope adjustments. The global counter counts from 0x77FF to zero, decrementing by one each sample. Note that the counter is initialized to zero (not 0x77FF) on reset. The noise and envelope adjustments use the following tables to determine when to perform their actions: // Number of samples per counter event counter_rates[32] = { Inf, 2048, 1536, 1280, 1024, 768, 640, 512, 384, 320, 256, 192, 160, 128, 96, 80, 64, 48, 40, 32, 24, 20, 16, 12, 10, 8, 6, 5, 4, 3, 2, 1 } // Counter offset from zero (i.e. not all counters are aligned at zero for all rates) counter_offsets[32] = { n/a, 0, 1040, 536, 0, 1040, 536, 0, 1040, 536, 0, 1040, 536, 0, 1040, 536, 0, 1040, 536, 0, 1040, 536, 0, 1040, 536, 0, 1040, 536, 0, 1040, 0, 0 } When (Counter + counter_offsets[R]) % counter_rates[R] is zero (where R is the current rate) the action is performed. This approach covers more than just the overall rate, but also the relative synchronization when switching between different rates (i.e. the first cycle will be shorter than the rest depending on when the rate change occurs). It's quite certain that Nintendo did not implement divide-with-remainder logic in the S-DSP given both the era the chip was designed and the space limitations. With that said, the above equation does work for all cases but is more geared towards a software emulator. For an HDL implementation, however, it is not very practical (although it can be done). It has been demonstrated in FPGA hardware that exactly identical behavior (including offsets and relative synchronizations when switching rates) can be generated using only a series of inter-dependent clocks whose logic resource utilization is >95% smaller than the equivalent modulus-based implementation. Another option (created by Mednafen) which does not require the use of modulus/division and is also more FPGA/HDL friendly is described below. In this implementation the counter is still initialized to zero on reset, but is not controlled by a simple decrement once per sample. Instead the following function is used: void run_glbl_cntr (void) { if(!(Counter & 0x7)) Counter ^= 0x5; if(!(Counter & 0x18)) Counter ^= 0x18; Counter -= 0x29; } The Counter variable will count upwards but unlike the modulus-based method described earlier it is not a consecutive series of numbers. The noise and envelope adjustments use the following tables to determine when to perform their actions: // Selects how many bits of the /1 counter to use (to give /1, /2, /4, /8, // etc.), and to optionally select the bits of the /5 or /3 divider (to // optionally give rates like /5 or /10 or /20 etc., or /3 or /6 or /12 etc.). uint16 counter_masks[32] = { 0x0000, 0xFFE0, 0x3FF8, 0x1FE7, 0x7FE0, 0x1FF8, 0x0FE7, 0x3FE0, 0x0FF8, 0x07E7, 0x1FE0, 0x07F8, 0x03E7, 0x0FE0, 0x03F8, 0x01E7, 0x07E0, 0x01F8, 0x00E7, 0x03E0, 0x00F8, 0x0067, 0x01E0, 0x0078, 0x0027, 0x00E0, 0x0038, 0x0007, 0x0060, 0x0018, 0x0020, 0x0000 }; // Adjusts for relative timing offsets and handles R=0 case. Could also be // thought of as counter_compare. uint16 counter_xors[32] = { 0xFFFF, 0x0000, 0x3E08, 0x1D04, 0x0000, 0x1E08, 0x0D04, 0x0000, 0x0E08, 0x0504, 0x0000, 0x0608, 0x0104, 0x0000, 0x0208, 0x0104, 0x0000, 0x0008, 0x0004, 0x0000, 0x0008, 0x0004, 0x0000, 0x0008, 0x0004, 0x0000, 0x0008, 0x0004, 0x0000, 0x0008, 0x0000, 0x0000 }; When (Counter & counter_masks[R]) ^ counter_xor[R] is zero (where R is the current rate) the action is performed. Just like the modulus-based method, this approach covers both the overall rate and the relative synchronization when switching between different rates. VOLUME CONTROL & ECHO ===================== In all cases, volume samples are adjusted in a simple linear fashion: Sout = (Sin * vol) >> vol_shift. "vol_shift" is chosen to give vol an effective range of -1<=vol<1. Thus, if vol is unsigned then vol_shift is the number of bits in vol, while if vol is signed then vol_shift is one less (e.g. 8-bit signed has a vol_shift of 7). In all cases, mixed values are clamped to 16 bits. There are several layers to S-DSP volume control. First, the sample is adjusted by the volume envelope (11 bits unsigned). Then each sample is adjusted by the per-voice volume (8-bit two's complement) separately for the left and right channels (which may invert the phase of the signal). After all voices are mixed the volume is adjusted by the master volume (8-bit two's complement) separately for the left and right channels. And finally, the whole thing can be muted by the FLG register. Echo splits off the main audio path after the per-voice volume, before all enabled voices are mixed together. The echo buffer (specified by ESA and EDL) sample pointed to by the current echo offset is fed into the FIR filter, and that output is adjusted by the echo volume (8-bit two's complement) and mixed back into the main output (after master volume adjustment). Then (if echo write is enabled in FLG) the FIR output is adjusted by the echo feedback volume (8-bit two's complement) and mixed with all voices enabled in EON, and output into the end of the echo ring buffer. So note that if echo write is disabled, the "echo ring buffer" becomes a static sample buffer up to 0.96 seconds long. BRR DECODING ============ The input samples to the S-DSP are compressed via a method known as "bit rate reduction", compressing 16 16-bit samples into 9 byte blocks. The block format is: ssssffle 00001111 22223333 44445555 .... EEEEFFFF ssss = shift ff = filter l = loop (really "don't end") e = end (really "loop") 0000 = (D) data for sample #0 in this block, signed 2's complement ... FFFF = (D) data for sample #15 in this block, signed 2's complement While the pre-BRR samples were supposedly 16-bit, the BRR decoder seems to lose the low bit. This can be seen below, in that the input RD loses a bit at the low end. The bit is 'recovered' after the VxVOLL/VxVOLR volume adjustment. The 'shift' value scales the sample data D. Values 0-12 work normally, 16-bit RD=(D<>1. Values 13-15 force RD to either 0x0000 or 0xF800 depending on the sign of the input D (i.e. they give the same values as 0 or F do with shift=12). Each voice has a 12-sample ring buffer for decoding BRR data, divided into 3 groups of 4 samples. BRR data is always decoded in a group of 4 samples. There are two 'active' groups, and one reserve group. When the interpolation index passes 0x4000, the ring is turned and a new group of BRR data is decoded into the new reserve group. There are 4 possible 'filters' to use in decoding the blocks. Some filters use previous samples in decoding, this does carry over between groups and blocks and is separate for each voice. Filter 0 (Direct): S(x) = RD Filter 1 (15/16): S(x) = RD + S(x-1) + ((-S(x-1))>>4) Filter 2 (61/32-15/16): S(x) = RD + (S(x-1)<<1) + ((-((S(x-1)<<1)+S(x-1)))>>5) - S(x-2) + (S(x-2)>>4) Filter 3 (115/64-13/16): S(x) = RD + (S(x-1)<<1) + ((-(S(x-1)+(S(x-1)<<2)+(S(x-1)<<3)))>>6) - S(x-2) + (((S(x-2)<<1) + S(x-2))>>4) The calculations above are preformed in some higher number of bits, clamped to 16 bits at the end and then clipped to 15 bits. This 15-bit value is the value output and the value used as S(x-1) or S(x-2) as needed for future filter iterations. Certain games do seem to depend on these exact formulas, trying to simplify will break some sound effects. If the very first block in a sample uses a filter other than Direct, the previous samples are taken from the *physical* end of the BRR ring buffer since the buffer index is reset to 0 on KON. Note that BRR decoding never stops for a voice: KOFF and FLG bit 7 don't affect it at all (they just set the envelope), and it always loops after reaching a block with 'e' set ('l' clear again just sets the envelope). KON is the only thing that actually affects a BRR decode in progress, and that simply restarts it from the beginning. Now, as for the remaining two bits. If 'e' is set for the block, the bit in ENDX is set when the block is complete and the next block will be that pointed to by the loop pointer for this sample (see DIR and VxSRCN). Also, as soon as a header is loaded with 'e' set and 'l' clear, the voice goes into the Release state and the envelope goes to 0 immediately. Due to the 12-sample buffer, the 'e' and 'l' bits of the final block can be seen before the final few samples of the penultimate block are output if the pitch rate is slow enough. The samples in the final block will never be output. When a voice is keyed on, there are 5 '0x0000' samples output before the first sample encoded by the BRR data. These are used to preload the BRR ring buffer: #0 = After the final pre-KON sample is prepared, the envelope is set to 0 and enters the Attack state, and is not updated for the next several samples. The interpolation index is reset to 0, and is not updated for the next several samples. The final pre-KON BRR decode also occurs here (which can matter if the first block of the new BRR data uses a non-Direct filter). #1 = The first '0x0000' sample. At step S2, the start address is read. No BRR decoding or header checks, envelope updating, or interpolation index updating is performed. #2 = At step S4, first BRR group is decoded. No envelope or interpolation index updating. #3 = At step S4, second BRR group is decoded. No envelope or interpolation index updating. #4 = At step S4, third BRR group is decoded. No envelope or interpolation index updating. #5 = Envelope updating begins. The sample output is still '0x0000', because of the order in which voice operations are performed. The interpolation position is still 0. #6 = Finally, we see the first data sample. The first interpolation position update is done during step S4. PITCH ADJUSTMENTS ================= The S-DSP has two methods to adjust the 'pitch' of the input sound. Each voice has a 14-bit pitch control, and for voices 1-7 this can be further tweaked by the output sample of the previous voice. The pitch adjustment is fairly simple: pitch = voice[x].PITCH; if(PMON&~NON&~1&(1<> 5) * voice[x].PITCH) >> 10; voice[x].interpolation_index += pitch; if(voice[x].interpolation_index>0x7FFF) voice[x].interpolation_index = 0x7FFF; In the above, remember that voice[x].PITCH is only 14 bits while the 'pitch' variable is large enough to never wrap. Additionally, note that the pitch calculation is performed as a SIGNED operation while the interpolation_index calculation is performed as an UNSIGNED operation. When determining whether a new BRR group is needed: if(voice[x].interpolation_index>=0x4000){ NextBRRGroup(x); voice[x].interpolation_index -= 0x4000; } The samples in the BRR buffer are then interpolated using a 4-point gaussian interpolation. Note that pitch adjustment does not function on noise voices (see NON) or on voice 0. The exact interpolation table from libopenspc is: // Gaussian table by libopenspc // Take note of the 'int32' datatype. These 11-bit hex values are all // positive and must be treated as signed. static const int32 gauss_coeffs[512] = { 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x001, 0x001, 0x001, 0x001, 0x001, 0x001, 0x001, 0x001, 0x001, 0x001, 0x001, 0x002, 0x002, 0x002, 0x002, 0x002, 0x002, 0x002, 0x003, 0x003, 0x003, 0x003, 0x003, 0x004, 0x004, 0x004, 0x004, 0x004, 0x005, 0x005, 0x005, 0x005, 0x006, 0x006, 0x006, 0x006, 0x007, 0x007, 0x007, 0x008, 0x008, 0x008, 0x009, 0x009, 0x009, 0x00A, 0x00A, 0x00A, 0x00B, 0x00B, 0x00B, 0x00C, 0x00C, 0x00D, 0x00D, 0x00E, 0x00E, 0x00F, 0x00F, 0x00F, 0x010, 0x010, 0x011, 0x011, 0x012, 0x013, 0x013, 0x014, 0x014, 0x015, 0x015, 0x016, 0x017, 0x017, 0x018, 0x018, 0x019, 0x01A, 0x01B, 0x01B, 0x01C, 0x01D, 0x01D, 0x01E, 0x01F, 0x020, 0x020, 0x021, 0x022, 0x023, 0x024, 0x024, 0x025, 0x026, 0x027, 0x028, 0x029, 0x02A, 0x02B, 0x02C, 0x02D, 0x02E, 0x02F, 0x030, 0x031, 0x032, 0x033, 0x034, 0x035, 0x036, 0x037, 0x038, 0x03A, 0x03B, 0x03C, 0x03D, 0x03E, 0x040, 0x041, 0x042, 0x043, 0x045, 0x046, 0x047, 0x049, 0x04A, 0x04C, 0x04D, 0x04E, 0x050, 0x051, 0x053, 0x054, 0x056, 0x057, 0x059, 0x05A, 0x05C, 0x05E, 0x05F, 0x061, 0x063, 0x064, 0x066, 0x068, 0x06A, 0x06B, 0x06D, 0x06F, 0x071, 0x073, 0x075, 0x076, 0x078, 0x07A, 0x07C, 0x07E, 0x080, 0x082, 0x084, 0x086, 0x089, 0x08B, 0x08D, 0x08F, 0x091, 0x093, 0x096, 0x098, 0x09A, 0x09C, 0x09F, 0x0A1, 0x0A3, 0x0A6, 0x0A8, 0x0AB, 0x0AD, 0x0AF, 0x0B2, 0x0B4, 0x0B7, 0x0BA, 0x0BC, 0x0BF, 0x0C1, 0x0C4, 0x0C7, 0x0C9, 0x0CC, 0x0CF, 0x0D2, 0x0D4, 0x0D7, 0x0DA, 0x0DD, 0x0E0, 0x0E3, 0x0E6, 0x0E9, 0x0EC, 0x0EF, 0x0F2, 0x0F5, 0x0F8, 0x0FB, 0x0FE, 0x101, 0x104, 0x107, 0x10B, 0x10E, 0x111, 0x114, 0x118, 0x11B, 0x11E, 0x122, 0x125, 0x129, 0x12C, 0x130, 0x133, 0x137, 0x13A, 0x13E, 0x141, 0x145, 0x148, 0x14C, 0x150, 0x153, 0x157, 0x15B, 0x15F, 0x162, 0x166, 0x16A, 0x16E, 0x172, 0x176, 0x17A, 0x17D, 0x181, 0x185, 0x189, 0x18D, 0x191, 0x195, 0x19A, 0x19E, 0x1A2, 0x1A6, 0x1AA, 0x1AE, 0x1B2, 0x1B7, 0x1BB, 0x1BF, 0x1C3, 0x1C8, 0x1CC, 0x1D0, 0x1D5, 0x1D9, 0x1DD, 0x1E2, 0x1E6, 0x1EB, 0x1EF, 0x1F3, 0x1F8, 0x1FC, 0x201, 0x205, 0x20A, 0x20F, 0x213, 0x218, 0x21C, 0x221, 0x226, 0x22A, 0x22F, 0x233, 0x238, 0x23D, 0x241, 0x246, 0x24B, 0x250, 0x254, 0x259, 0x25E, 0x263, 0x267, 0x26C, 0x271, 0x276, 0x27B, 0x280, 0x284, 0x289, 0x28E, 0x293, 0x298, 0x29D, 0x2A2, 0x2A6, 0x2AB, 0x2B0, 0x2B5, 0x2BA, 0x2BF, 0x2C4, 0x2C9, 0x2CE, 0x2D3, 0x2D8, 0x2DC, 0x2E1, 0x2E6, 0x2EB, 0x2F0, 0x2F5, 0x2FA, 0x2FF, 0x304, 0x309, 0x30E, 0x313, 0x318, 0x31D, 0x322, 0x326, 0x32B, 0x330, 0x335, 0x33A, 0x33F, 0x344, 0x349, 0x34E, 0x353, 0x357, 0x35C, 0x361, 0x366, 0x36B, 0x370, 0x374, 0x379, 0x37E, 0x383, 0x388, 0x38C, 0x391, 0x396, 0x39B, 0x39F, 0x3A4, 0x3A9, 0x3AD, 0x3B2, 0x3B7, 0x3BB, 0x3C0, 0x3C5, 0x3C9, 0x3CE, 0x3D2, 0x3D7, 0x3DC, 0x3E0, 0x3E5, 0x3E9, 0x3ED, 0x3F2, 0x3F6, 0x3FB, 0x3FF, 0x403, 0x408, 0x40C, 0x410, 0x415, 0x419, 0x41D, 0x421, 0x425, 0x42A, 0x42E, 0x432, 0x436, 0x43A, 0x43E, 0x442, 0x446, 0x44A, 0x44E, 0x452, 0x455, 0x459, 0x45D, 0x461, 0x465, 0x468, 0x46C, 0x470, 0x473, 0x477, 0x47A, 0x47E, 0x481, 0x485, 0x488, 0x48C, 0x48F, 0x492, 0x496, 0x499, 0x49C, 0x49F, 0x4A2, 0x4A6, 0x4A9, 0x4AC, 0x4AF, 0x4B2, 0x4B5, 0x4B7, 0x4BA, 0x4BD, 0x4C0, 0x4C3, 0x4C5, 0x4C8, 0x4CB, 0x4CD, 0x4D0, 0x4D2, 0x4D5, 0x4D7, 0x4D9, 0x4DC, 0x4DE, 0x4E0, 0x4E3, 0x4E5, 0x4E7, 0x4E9, 0x4EB, 0x4ED, 0x4EF, 0x4F1, 0x4F3, 0x4F5, 0x4F6, 0x4F8, 0x4FA, 0x4FB, 0x4FD, 0x4FF, 0x500, 0x502, 0x503, 0x504, 0x506, 0x507, 0x508, 0x50A, 0x50B, 0x50C, 0x50D, 0x50E, 0x50F, 0x510, 0x511, 0x511, 0x512, 0x513, 0x514, 0x514, 0x515, 0x516, 0x516, 0x517, 0x517, 0x517, 0x518, 0x518, 0x518, 0x518, 0x518, 0x519, 0x519 }; // 4-point gaussian interpolation i = voice[x].interpolation_index >> 12; // 0 <= i <= 7 d = (voice[x].interpolation_index >> 4) & 0xff; // 0 <= d <= 255 outx = ((gauss_coeffs[255-d] * voice[x].BRRdata[i+0]) >> 11); outx += ((gauss_coeffs[511-d] * voice[x].BRRdata[i+1]) >> 11); outx += ((gauss_coeffs[256+d] * voice[x].BRRdata[i+2]) >> 11); // The above 3 wrap at 15 bits signed. The last is added to that, and is // clamped rather than wrapped. outx = ((outx & 0x7FFF) ^ 0x4000) - 0x4000; outx += ((gauss_coeffs[ 0+d] * voice[x].BRRdata[i+3]) >> 11); CLAMP15(outx); S-DSP REGISTERS =============== The S-DSP contains a number of registers, which are internally polled at various points during the 32-cycle sample generation loop and often stored internally for later use. Thus, most writes do not take effect immediately. All registers are accessed by the SPC700 setting the address in $00F2, then reading/writing $00F3. Note that the register addresses use only 7 bits: $80-$ff are read-only mirrors of $00-$7f. Any unspecified registers/bits are read/write with no known effect. On power on, most registers are uninitialized. There does seem to be something of a pattern, but it's nothing specific and seems to differ based between chips. On reset, most registers retain their previous values. Some notable exceptions: FLG will always act as if set to 0xE0 after power on or reset, even if the value read back indicates otherwise. VxENVX and VxOUTX are of course 0, since all channels are in the Release state due to FLG. And ENDX will be 0 on power on or reset, but recall that the voices are still running even when keyed off so the various bits may have been set by BRR decoding by the time you get to read it. First, the 10 per-voice registers. These occupy $00-$09, $10-$19, and so on up to $70-$79. $x0 rw VxVOLL - Left volume for Voice x $x1 rw VxVOLR - Right volume for Voice x vvvvvvvv These are the volumes of the voice in the left/right stereo channel. The value is 2's-complement, negative values invert the phase of the signal in the channel. Volume adjustment is: SL = (int16_t)((S * VL)>>7) SR = (int16_t)((S * VR)>>7) VxVOLL is accessed during voice processing step S4, cycles: V0:31 V1:2 V2:5 V3:8 V4:11 V5:14 V6:17 V7:20 VxVOLR is accessed during voice processing step S5, cycles: V0:0 V1:3 V2:6 V3:9 V4:12 V5:15 V6:18 V7:21 $x2 rw VxPITCHL - Pitch scaler for Voice x low byte $x3 rw VxPITCHH - Pitch scaler for Voice x high byte --pppppp pppppppp This 14-bit number adjusts the pitch of the sounds output for this voice, as the function: Fout = Fin * P / 0x1000 Considering things on the normal 12-note scale, P=0x2000 will increase the pitch by one octave, P=0x3FFF will increase by (just about) two octaves, P=0x0800 will reduce by one octave, P=0x0400 will reduce by two octaves, and so on. Note that even though the high bits of $x3 are not significant, they are still read back as written. VxPITCHL is accessed during voice processing step S2, cycles: V0:21 V1:0 V2:3 V3:6 V4:9 V5:12 V6:15 V7:18 VxPITCHH is accessed during voice processing step S3a, cycles: V0:22 V1:1 V2:4 V3:7 V4:10 V5:13 V6:16 V7:19 $x4 rw VxSRCN - Source number for Voice x nnnnnnnn This selects the "instrument" this voice is to play. The number set here is used as an offset into the table pointed to by DIR. Changing this while the voice is playing will have no immediate effect, but when the voice afterwards loops or is keyed on it will use the new value. VxSRCN is accessed during voice processing step S1, cycles: V0:17 V1:20 V2:31 V3:2 V4:5 V5:8 V6:11 V7:14 $x5 rw VxADSR1 - Attack-Decay-Sustain-Release settings for Voice x (part 1) edddaaaa $x6 rw VxADSR2 - Attack-Decay-Sustain-Release settings for Voice x (part 2) lllrrrrr $x7 rw VxGAIN - Gain settings for Voice x EGGGGGGG or Emmggggg e/E = Envelope adjustment method bits. ddd = Decay rate: R=d*2+16 aaaa = Attack rate: R=a*2+1 lll = Sustain level (see note) rrrrr = Sustain rate: R=r mm = Gain mode ggggg = Gain rate: R=g GGGGGGG = Direct Gain mode gain setting: E=g*16 Note: the "lll" bits are the Sustain Level only when bit 'e' is set. If 'e' is clear, the top 3 bits of VxGAIN are used instead. These three registers give control over the volume envelope. The volume envelope is 11 bits unsigned: volume adjustment is S = (S * E)>>11 where S is the current sample. Various settings of these registers will automatically adjust the envelope after a certain number of samples, based on a counter as described above (see COUNTERS). The volume envelope adjustment has 4 states: Attack, Decay, Sustain, and Release. When the voice is keyed off or a BRR end-without-loop block is reached, the state is set to Release. When the voice is keyed on, the state is set to Attack. When the envelope is in the Release state, this overrides all settings of these registers. In this case, the counter rate R=31 (i.e. adjust every sample), and the adjustment is E-=8. The simplest method of envelope control ("Direct Gain") is available when VxADSR1 bit 7 and VxGAIN bit 7 are both clear. In this case, the volume envelope is simply E=%GGGGGGG0000, and R does not matter. The second method ("Gain", usually with one of the 4 names below) is available when VxADSR1 bit 7 is clear, but VxGAIN bit 7 is set. In this case, we have 4 options, chosen based on the 'm' bits. 00 = Linear Decrease. R=g, E-=32 01 = Exp Decrease. R=g, E-=((E-1)>>8)+1 10 = Linear Increase. R=g, E+=32 11 = Bent Increase. R=g, E+=(E<0x600)?32:8 In all cases, clip E to 0 or 0x7FF rather than wrapping. The most complex method ("ADSR") is used when VxADSR1 bit 7 is 1. You can think of this method as loading VxGAIN with different values at different times based on the value of the volume envelope. VxGAIN is not actually altered, however. Attack: If aaaa == %1111, R=31 and E+=1024. Otherwise, pretend VxGAIN = %110aaaa1. In either case, when E exceeds 0x7FF (before clamping) enter the Decay state. Decay: Pretend VxGAIN = %1011ddd0. When the upper 3 bits of E equal the Sustain Level (see above), enter the Sustain state. Sustain: Pretend VxGAIN = %101rrrrr. CRITICAL NOTE: These updates happen even when ADSR mode is not selected. These registers are actually used to update the envelope every sample. The calculated value is used as follows: 1. If the counter specifies the envelope is to be updated, the envelope is set to the new value, clamped to 11 bits. 2. If the mode is Decay and the Sustain Level is matched, change to the Sustain state. 3. If the mode is Attack and the new value is greater than 0x7FF, change to the Decay state. CRITICAL NOTE: Negative values also trigger this. 4. Save the new value, *pre-clamp*, to determine the increment for GAIN Bent Increase mode's next sample. VxADSR1 is accessed during voice processing step S2, cycles: V0:21 V1:0 V2:3 V3:6 V4:9 V5:12 V6:15 V7:18 VxADSR2 and VxGAIN are accessed during voice processing step S3c, cycles: V0:30 V1:1 V2:4 V3:7 V4:10 V5:13 V6:16 V7:19 $x8 r- VxENVX - Current envelope value for Voice X 0eeeeeee This returns the high 7 bits of the current volume envelope value (IOW, E>>4) for this voice. Note that the high bit will always be 0. Also note that (obviously) there is no way to directly determine the low 4 bits unless you're using Direct Gain. Technically, this register IS writable. But whatever value you write will be overwritten at 32000 Hz. VxENVX is updated during voice processing step S9, cycles: V0:4 V1:7 V2:10 V3:13 V4:16 V5:19 V6:22 V7:25 However, a write by the SMP to this register up to 2 cycles earlier will overwrite the DSP's updated value. $x9 r- VxOUTX - Current sample value for Voice X oooooooo This returns the high byte of the current sample for this voice, after envelope volume adjustment but before VxVOL[LR] is applied. Technically, this register IS writable. But whatever value you write will be overwritten at 32000 Hz. VxOUTX is updated during voice processing step S8, cycles: V0:3 V1:6 V2:9 V3:12 V4:15 V5:18 V6:21 V7:24 However, a write by the SMP to this register up to 2 cycles earlier will overwrite the DSP's updated value. Now, the general-purpose registers: $0c rw MVOLL - Left channel master volume $1c rw MVOLR - Right channel master volume vvvvvvvv These are the master volumes of the left/right stereo channel. The value is 2's-complement, negative values invert the phase of the channel. This is the adjustment applied to the mixed 16-bit stereo sample output of all 8 voices. Volume adjustment is: ML = (int16_t)((SL * VL)>>7) MR = (int16_t)((SR * VR)>>7) MVOLL is accessed during cycle 26. MVOLR is accessed during cycle 27. $2c rw EVOLL - Left channel echo volume $3c rw EVOLR - Right channel echo volume vvvvvvvv These are the echo volumes of the left/right stereo channel. The value is 2's-complement, negative values invert the phase of the channel. This is the adjustment applied to the FIR filter 16-bit output before mixing with the main signal (after master volume adjustment). Volume adjustment is: EL = (int16_t)((SL * VL)>>7) ER = (int16_t)((SR * VR)>>7) EVOLL is accessed during cycle 26. EVOLR is accessed during cycle 27. $4c rw KON - Key on for all voices $5c rw KOFF - Key off for all voices 76543210 Each bit of KON/KOFF corresponds to one voice. Setting 1 to the KOFF bit will transition the voice to the Release state. Thus, the envelope will decrease by 8 every sample (regardless of the VxADSR and VxGAIN settings) until it reaches 0, where it will stay until the next KON. Writing 1 to the KON bit will set the envelope to 0, the state to Attack, and will start the channel from the beginning (see DIR and VxSRCN). Note that this happens even if the channel is already playing (which may cause a click/pop), and that there are 5 'empty' samples before envelope updates and BRR decoding actually begin. These registers seem to be polled only at 16000 Hz, when every other sample is due to be output. Thus, if you write two values in close succession, usually but not always only the second value will have an effect: ; assume KOFF = 0, but no voices playing mov $f2, #$4c ; KON = 1 then KON = 2 mov $f3, #$01 ; -> *usually* only voice 2 is keyed on. If both are, mov $f3, #$02 ; voice 1 will be *2* samples ahead rather than one. and ; assume various voices playing mov $f2, #$5c ; KOFF = $ff then KOFF = 0 mov $f3, #$ff mov $f3, #$00 ; -> *usually* all voices remain playing FLG bit 7, however, is polled every sample and polled for each voice. These registers and FLG bit 7 interact as follows: 1. If FLG bit 7 or the KOFF bit for the channel is set, transition to the Release state. If FLG bit 7 is set, also set the envelope to 0. 2. If the 'internal' value of KON has the channel's bit set, perform the KON actions described above. 3. Set the 'internal' value of KON to 0. This has a number of consequences: * KON effectively takes effect 'on write', even though a non-zero value can be read back much later. KOFF and FLG.7, on the other hand, exert their influence constantly until a new value is written. * Writing KON while KOFF or FLG.7 will not result in any samples being output by the channel. The channel is keyed on, but it is turned off again 2 samples later. Since there is a 5 sample delay after KON before the channel actually beings processing, the net effect is no output. * However, if KOFF is cleared within 63 SPC700 cycles of the KON write above, the channel WILL be keyed on as normal. If KOFF is cleared betwen 64 and 127 SPC700 cycles later, the channel MIGHT be keyed on with decreasing probability depending on how many cycles before the KON/KOFF poll the KON write occurred. * Setting both KOFF and KON for a channel will turn the channel off much faster than just KOFF alone, since the KON will set the envelope to 0. This can cause a click/pop, though. KOFF and internal KON are accessed during cycle 30 every other sample. Internal KON bits are cleared during cycle 29, just before KON is accessed. $6c rw FLG - Reset, Mute, Echo-Write flags and Noise Clock rmennnnn r = When set, the S-DSP "soft-resets" itself. Mostly, this seems to mean the S-DSP acts as if KOFF=$ff and forces all envelopes to 0; echo proccessing still continues, and any remaining echo data will continue to echo and generate samples. You must clear the bit to resume normal operation. See KON/KOFF for some details. Note though that this bit is checked much more frequently than KOFF. m = When set, no sound will be output. Samples will still be decoded, echos processed, and such; just no sounds will be output. e = When set, the echo ring buffer (see ESA and EDL) will not be written. Echo processing on the buffer will continue as normal, just the buffer itself will not be updated and so the echo samples will loop forever. In other words, the echo pointer is always moving. The only thing that changes is whether or not the writes themselves occur. nnnnn = Noise frequency. This is used with the global counter to determine when to generate a new noise sample. Note that there is only one noise source shared by all voices for which noise is enabled (see NON). On reset, this register seems to have a value resembling $E0, even though this may not be read back. At least, 'r' is 'set' so we can't key on any samples, 'e' is 'set' so the echo buffer is not being updated, and 'm' is 'set' because even whatever static data is in the echo buffer gives no sound. 'n' is '0', since the noise sample is constant until this is set non-zero. FLG bit 'r' is accessed during voice processing step S3c, cycles: V0:30 V1:1 V2:4 V3:7 V4:10 V5:13 V6:16 V7:19 FLG bit 'e' is accessed during cycles 28 and 29. FLG bits 'n' are accessed during cycle 30. $7c r* ENDX - Voice end flags 76543210 When a BRR block with the end flag set is decoded in a voice, the corresponding bit is set in this register. When the voice is keyed on (successfully or not), the corresponding bit is cleared. Any write to this register will clear ALL bits, no matter what value is written. Note that the bit is set at the START of decoding the BRR block, not at the end. Recall that BRR processing, and therefore the setting of bits in this register, continues even for voices in the Release state. On power on or reset, all bits are cleared. ENDX is updated during voice processing step S7, cycles: V0:2 V1:5 V2:8 V3:11 V4:14 V5:17 V6:20 V7:23 However, a write by the SMP to this register up to 2 cycles earlier will overwrite the DSP's updated value. $0d rw EFB - Echo feedback volume vvvvvvvv When echo buffer write is enabled, the FIR output will be adjusted by this volume and mixed into the buffer. The value is 2's-complement, negative values invert the phase of the signal. Volume adjustment is: E = (int16_t)(E * V)>>7. EFB is accessed during cycle 26. $2d rw PMON - Pitch modulation enable 7654321- Each bit corresponds to the corresponding voice. When the bit is set, the VxPITCH value will be adjusted by the output of the voice x-1. The exact formula seems to be: P = VxPITCH + (((OutX[x-1] >> 5) * VxPITCH) >> 10) For the purposes of pitch adjustment, a voice not playing is all zeros and thus has no effect on the pitch. PMON is accessed during cycle 27. $3d rw NON - Noise enable 76543210 Each bit corresponds to the corresponding voice. When the bit is set, the samples produced by BRR decoding will not be used. Instead, the output sample will be the current value of the noise generator (see FLG). The noise generator outputs a 15-bit noise sample. The noise generator operation is as follows: On reset, N=0x4000. Each update (see FLG), N=(N>>1)|(((N<<14)^(N<<13))&0x4000). And the output noise sample at any point is N (after which is volume adjustment then the left-shift to 'restore' the low bit). Note that the noise sample is not affected by VxPITCH or PMON, but VxPITCH and PMON still control the speed of BRR decoding and the end-without-loop of BRR decoding will still transition to Release (and update ENDX). NON is accessed during cycle 28. $4d rw EON - Echo enable 76543210 Each bit corresponds to the corresponding voice. When the bit is set and echo buffer write is enabled, this voice will be mixed into the sample to be written to the echo buffer for later echo processing. EON is accessed during cycle 28. $5d rw DIR - Sample table address aaaaaaaa This forms the high byte of the start address of the sample pointer table (the low byte is always 0). The sample pointer table is indexed for each voice by VxSRCN to determine which BRR data to decode and play. Each entry is 4 bytes. The first word points to the start of the BRR data, and the second word points to the 'restart' point for when the BRR end block is reached. These are referred to as the Source Start Address (SA) and the Source Loop Start Addres (LSA), respectively. Changing this while voices are playing will have no immediate effect, but when any voice afterwards loops or is keyed on it will use the new table. DIR is accessed during cycle 28. $6d rw ESA - Echo ring buffer address aaaaaaaa This forms the high byte of the start address of the echo ring buffer (the low byte is always 0). When echo buffer write is enabled in FLG, all voices marked in EON will be mixed together, mixed with the FIR output (adjusted by the echo feedback volume), and output into the ring buffer (4 bytes, 2 per stereo channel). And every sample, one entry (4 bytes) will be removed from the ring buffer and passed into the FIR filter. The size of the buffer is controlled by EDL. The echo buffer will wrap within 16 bits, if the ESA and EDL values combine to specify a buffer that would go beyond address $FFFF. Note that the register is accessed 32 cycles before the value is used for a write; at a sample level, this causes writes to appear to be delayed by at least a full sample before taking effect. ESA is accessed during cycle 29. $7d rw EDL - Echo delay (ring buffer size) ----dddd This controls the size of the echo ring buffer, and therefore the delay between when a sample is first output and when it enters the echo FIR filter. The size of the buffer is simply D<<11 bytes (D<<9 16-bit stereo samples), however when D=0 the buffer is 4 bytes (1 16-bit stereo sample) rather than 0. Note that only the low 4 bits are used to determine the buffer length. The register value is only used under certain conditions: * Write the echo buffer at sample 'idx' (cycles 29 and 30) * If idx==0, set idx_max = EDL<<9 (cycle 30-ish) * Increment idx. If idx>=idx_max, idx=0 (cycle 30-ish) This means that it can take up to .24s for a newly written value to actually take effect, if the old value was 0x0f and the new value is written just after the cycle 30 in which buffer index 0 was written. EDL is accessed during cycle 29. $xf rw FFCx - Echo FIR Filter Coefficient (FFC) X cccccccc These 8 registers specify the 8 2s-complement coefficients of the 8-tap FIR filter used to calculate the echo signal. Each time a sample is generated by the voices, one sample is taken from the echo ring buffer and input to the FIR filter (this is S(x)). The FIR filter output is then mixed with the outputs of the voices to generate the output sound, and mixed with the sample being input into the echo buffer for echo feedback. Note that the echo buffer contains 15-bit samples left-aligned within the 16-bit word, so the 16-bit value read must be right-shifted by one bit to get the proper 15-bit S(x). The internal calculations, however, are done in 16 bits with the final output of the FIR being a 16-bit value. The FIR formula is: // The value is clipped when mixing samples x-1 to x-7: FIR = (int16)(S(x-7) * FFC0 >> 6 // oldest sample + S(x-6) * FFC1 >> 6 + S(x-5) * FFC2 >> 6 + S(x-4) * FFC3 >> 6 + S(x-3) * FFC4 >> 6 + S(x-2) * FFC5 >> 6 + S(x-1) * FFC6 >> 6); // We have overflow detection when adding the most recent sample // only: FIR = clamp16(FIR + S(x-0) * FFC7 >> 6); // newest sample // Finally, mask of the LSbit to get the final 16-bit result: FIR = FIR & ~1; Note that the left and right stereo channels are filtered separately (no crosstalk), but with identical coefficients. FFC0 is accessed during cycle 22. FFC1 and FFC2 are accessed during cycle 23. FFC3, FFC4, and FFC5 are accessed during cycle 24. FFC6 and FFC7 are accessed during cycle 25. The echo buffer left channel is read during cycle 22, and written during cycle 29. The echo buffer right channel is read during cycle 23, and written during cycle 30.