Pre-processing data for machine learning