One of the random things I thought of today was horribly abusing the level system in Jazz Jackrabbit 2 to actually show FMV cutscenes within the level.
Hmm... crank it down to 320x240 resolution, that gives 75 32x32 tiles per frame (10 across, 7.5 down). Actually, make that 80 tiles per frame as we can't code half tiles. Given that there are 1024 tiles available, and each animated one counts for one of those, this gives 11 frames of video. (Ignoring TSF as I at least don't have it)
But we can optimise this by coding similar tiles only once, and by making good use of static and/or tiled backgrounds. We could also reduce the picture size - if we drop the bottom (or possibly top) row of tiles, and only use 70 for each frame, we get at least two more frames of video.
So what can we do sensibly with this? Given that extra frames only cost tile space for new tiles, we could easily do a pair of (non-overlapping) talking heads and give each head a dozen frames or so. Use the game engine to show text captions by abusing sucker tubes and text events, or even record speech and play it back. JJ2 uses tracker-based music, so this could be abused to give sound. There's various tricks to reduce the size of a sample, so it should be possible to get a good dialogue going. The main limits for length are the complexity of the animation, size of the music file, and the maximum size of an animated tile. We could probably get away with a frame rate of 12fps, though higher is always better.
Hmm, I'll have to try and knock up a proof-of-concept of this.