If you assume a completely linear system, then I agree with the waves on an ocean analogy. It is simply the principle of superposition. I also wonder how much the sub actually does move the ESL panel in practice. If the sub is truly an omni source and the ESL panel directly above it, the pressure on the back would be the same as in front and no movement. Of course a sub is not a perfect point source, and reality is a bit more complex. If the sub did manage to move the panel enough, I wonder if there could be some nonlinearity (modulation effect) produced?
As for mono summing with dipole response, I think of it as a pressure source and a velocity source (respectively). The way that they couple into the room will be compementary. For any given mode, the pressure source (standard sub) will couple most efficiently to pressure maxima in the room. The velocity source (dipole) will couple most efficiently to the velocity maxima in the room. THe pressure maxima correspond to antinodes (peaks) in the spatial response of the room, and the velocity maxima correspond to nodes (dips) in the spatial resposne of the room. The velocity source is complicated by the fact that it will depend on the orientation of the source in the room. Anyway, that means the way the sub and ESL panel couple to the room (and thus is transfered to the listener) would be guarenteed to be different, since one will be coupling to pressure maxima and one to velocity maxima (mutually exclusive).
This might sound complicated, but in real life it's proably even more complicated! The above is probalby a gross simplification.