Validity Challenges In Machine Learning Benchmarks