"generate image in every conversation to describe the scene of conversation"
"generate images to describe the users input"