Efficient Sampling and Data Harmonization for Electronic Health Records