Guidelines to Prepare Datasets for NLP