Data preprocessing

Cleaning, preprocessing and preparing data to be used for reports, data analysis or modeling

I can help with:

  • Data cleaning
    Detecting and removing duplicated data, missing data and/or faulty data.
  • Data preprocessing
    Applying data preprocessing function based on data types (text, number, dates, categories) such as outlier detection and removal, imputing missing data, statistical transformation (skewed data, data with outliers).
  • Preparing satellite data
    Collect, preprocess, clean, aggregate satellite images to perform further analysis.
  • Data integrity
    Create reusable and reproducible data cleaning and preprocessing workflows.

Related projects

  • I preprocessed tens of thousands of  LinkedIn job offers to extracted job title, description, job level and salary information. I prepared the text data to be used for training machine learning models (tokenization, lemmization, removing stop-words...)
  • I cleaned and prepared hundreds of GBs of satellite data to perform flood detection. I aggregated weekly flood data to generate yearly flood maps for countries such as Pakistan, India and France.
  • I worked with different flood maps format to study the difference between FEMA historical flood maps and actual flood maps in the city of St Louis in the US.

