
Evaluating a cheminformatics dataset - Part 1: An ML engineering and DevOps showcase
So far, my career path led me from analytical chemistry to data science, where I gained further experience in ML engineering, DevOps, and software engineering. In the first part of this showcase series I want to focus on the ML engineering and DevOps part of evaluating a cheminformatics dataset, where we will dive deeper into the following topics: Understanding the dataset/the problem Developing an extract, load, transform (ELT) workflow Train, validate, and test machine learning (ML) models Transferring the ELT workflow and ML processes into robust and scalable pipelines Explore real-world production scenarios In the second part of this showcase series, which will be released later in a separate article, we will focus on the software engineering aspect. In general, we will follow a top-down approach when going through this showcase, meaning that we are creating a complete running workflow first, with a focus on the orchestration and selection of tools. Therefore, we will only see some high-level code snippets in this first part of my showcase series. In the second part (released as a separate article in future) we will dive deeper into the codebase and explore software engineering best practices like creating clean and maintainable code, following test-driven design principles and establishing a proper software lifecycle. ...


