As far as I know it is event driven; the animations are paced with the spoken text. It is therefore independent of the speed of the computer.
There are several sources explaining how this is done if you google for "lip sync technique" if you are interested.