Waterfall or Agile? The best methodology for AI and machine learning

unsplash

The two most widely considered software development models in modern project management are, without any doubt, the Waterfall Methodology and the Agile Methodology.

While this is not a post that exhaustively explains the two models, generally speaking the Waterfall approach is the way to go for certain areas of engineering design, especially those in which it is accepted to have progress flowing in one direction (hence the name waterfall).

 

Schema of a typical waterfall methodology

 

Software development purists look at the Waterfall methodology as the model to look at for highly structured projects e.g. Operating System design, real-time codecs, scientific software or software for critical environments.

For AI and machine learning projects however, this approach can be deleterious and lead to longer development cycles and project failure. Below there is a table that not only summarises some differences between the two methodologies, but also emphasises the major reasons why Agile is probably the best development method for data science projects.

 

 

WATERFALL
AGILE
SequentialIterative
Requirements clear from the beginningEven if the customer’s scope is well defined, the requirements evolve and tend to change during the course of the project
Fixed plan Adaptable plan to better accommodate business needs and continuous feedbacks
Inflexible to changesOnly general milestones are created that describe where the project should head
Deliver product that has been planned  Create minimal viable product (MVP) as fast as possible, so that business users can play with it and provide feedback 
Project is divided into successive phases, which are not revisited once completedProject is divided in 2 weeks iterative sprints and one can go back and forth between sprints until completion 
Tests are done at the endTests are performed throughout the project
Participation of business users is not requiredBusiness users are involved throughout the development phase to regulate users’ needs and priorities
Precise cost and time estimationDifficult to know beforehand how many sprints are needed to achieve requirements
Algorithms are developed after requirements have been gatheredAlgorithms are developed immediately
Simple to give updates to the management and business teams, due to detailed planning and accurate budget estimationsHard to update all parties especially when they are not deeply involved
Does not work well for the data discovery process due to the cyclical nature of the latterSuitable to data science projects, which comprise multiple iterations of understanding a business problem by asking questions, data acquisition from multiple sources, data cleaning, feature engineering and modelling
Fail slow: only towards the end of the project one knows if all KPIs have been reachedFail fast, e.g. if model performance is 70% and the minimum valuable performance is 90% the project can be stopped earlier since it is unlikely that the goal will be reached any time soon 
A model is deployed only at the end of the projectModel deployment occurs as soon as an acceptable enough model has been designed and validated

Subscribe to our Newsletter