Building agents that can see, talk, and act