Pixel-Level Video Understanding With Efficient Deep Models