Not now and I am swamped at work and home (getting ready for a concert). The basic jitter derivation is in many converter and signal-processing texts and is derived several different ways, some using probability (expectation integrals), others Bessel functions, some simply use a straight-line approximation of the sine wave around an lsb, calculate the area of the resultant triangles for quantization noise, then relate to changes in area with jitter time to calculate SNR reduction, etc. I believe it is in the Oppenheim and Schaffer seminal texts Digital Signal Processing and Discrete-Time Signal Processing, etc. Honestly, I have known that for so long, and had it drilled into me through so many college classes, workshops, work experience (theoretical and practical lab measurements), etc. over a variety of ADC/DAC designs I have worked on (SAR, flash, encoded-flash, delta-sigma, binary, unary, segmented, etc.) that I have not thought about it (the basic derivation) in ages. Converter companies like ADI, TI/Burr-Brown, Comlinear/National, etc. have app notes that discuss it. I am sure I have it in my notes but don't have the time to dig them up right now. One of my jitter threads here on WBF goes through a hand-waving explanation. It is related to aperture time, the time it takes a signal to pass through one lsb. It turns out clock frequency drops out and all you are left with is resolution and signal frequency, so the jitter deviation (in time) is related to signal frequency and resolution only.
The number of levels the signal and noise level is the heart of it, as well as how long (in time or UIs) the jitter deviates, but that is independent of the clock rate and architecture. Jitter is a problem in a delta-sigma loop due to both noise shaping and loop stability issues; it does not change the fundamental relationship between wideband random jitter and the signal with respect to quantization levels in-band. Note: I use delta-sigma, DS or D-S, because Gabor Temes once said the original paper was mistranslated and it should be D-S, not S-D; this is because the differencing element (D) comes before the summer (S) in the loop. For that matter, I am pretty sure I have his notes, or his and John Candy's, showing the same conclusion about jitter, from my long-ago grad classes.
I suspect we are getting mixed up on random jitter and other timing errors in the loop. I am not sure what you mean when you say the noise occupies the "whole 1bit" -- most schemes assume a limiter so jitter impacts only the edges.