A Closer Look At The Triad In Data-Driven Vision And Language: Curation, Representation, And Learning