For the extended end-user products, please refer to the index repo Awesome-ChatTTS maintained by the community. You can find a diagram visualization of the codebase here. ChatTTS is a text-to-speech ...
The main novelty seems to be an extra layer of indirection with the prior network (whether it is an autoregressive transformer or a diffusion network), which predicts an image embedding based on the ...