Visual Object Segmentation Based On Temporal And Linguistic Cues