Stable Diffusion is an open-source AI model that can generate images from text. Riffusion tweaked the model to make it able to create images called spectrograms, and then turn those into audio clips. Spectrograms are like pictures that show what different frequencies sound like at different times. They also made an interactive web app so anyone can type in a prompt to generate an audio clip, and then the app will make a smooth transition between different prompts or different seeds of the same prompt.
Give it a try? Here