I can help with:
- Data cleaning
Detecting and removing duplicated data, missing data
and/or faulty data.
- Data preprocessing
Applying data preprocessing function based on data
types (text, number, dates, categories) such as outlier detection and removal, imputing
missing data, statistical transformation (skewed data, data with outliers).
- Preparing satellite data
Collect, preprocess, clean, aggregate satellite
images to perform further analysis.
- Data integrity
Create reusable and reproducible data cleaning and
preprocessing workflows.
Related projects
- I preprocessed tens of
thousands of LinkedIn job offers to extracted job title, description, job level and
salary information. I prepared the text data to be used for training machine learning models
(tokenization, lemmization, removing stop-words...)
- I cleaned and prepared hundreds of GBs of satellite data to perform flood detection. I
aggregated weekly flood data to generate yearly flood maps for countries such as Pakistan,
India and France.
- I worked with different flood maps format to study the difference between FEMA historical flood maps and
actual flood maps in the city of St Louis in the US.