Waterfall or Agile? The best methodology for AI and machine learning


The two most widely considered software development models in modern project management are, without any doubt, the Waterfall Methodology and the Agile Methodology.

An overview of the Waterfall model

The Waterfall approach is the way to go in “consolidated” areas of engineering design. In these fields you can assume that progress flows in one direction. In layman terms, once you make up your mind there are no second thoughts. From here the name waterfall.

Schema of a typical waterfall methodology
Schema of a typical waterfall methodology

Software development purists look at the Waterfall methodology as the model to look at for highly structured projects e.g. Operating System design, real-time codecs, scientific software or software for critical environments.

However, this approach can be deleterious for AI and machine learning projects. Its adoption could lead lead to long development cycles and project failures.

Agile development methodology is much more suited to machine learning projects. The table below  summarises some differences between the two methodologies. It also emphasises the major reasons why Agile is probably the best development method for data science projects.

A comparison between waterfall and agile in machine learning

Requirements clear from the beginning.Within a well-defined scope, requirements evolve during the course of the project.
Fixed plan Adaptable plan to suits needs and feedbacks
Inflexible to changesOnly a few  general milestones,  describing where the project should head
Deliver product as planned  Create minimal viable product (MVP) as fast as possible, so that users can provide feedback 
Project is divided into successive phases, which are not revisited once completedProject is divided in 2 weeks iterative sprints.  It’s possible to go back and forth between sprints, until completion 
Tests are done at the endTests are performed throughout the project
Participation of  users not requiredUsers participate throughout the development phase
Precise cost and time estimationDifficult to estimate the number of sprints needed to achieve requirements
Development of algorithms takes place after the gathering of requirementsImmediate development of algorithms
Simple to give updates to the management and business teams, due to detailed planning and accurate budget estimationsHard to update all parties especially when they are not deeply involved
Does not work well for the data discovery process due to the cyclical nature of the latterSuitable to data science projects, which comprise multiple iterations of understanding a business problem by asking questions, data acquisition from multiple sources, data cleaning, feature engineering and modelling
Fail slow: only towards the end of the project one knows whether the project reached its key goals
Fail fast, e.g. if model performance is 70% and the minimum valuable performance is 90% the project can be stopped earlier since it is unlikely that the goal will be reached any time soon 
Model deployment happens only at the end of the projectModel deployment occurs as soon as an acceptable enough model is ready



Scaling Artificial Intelligence

Subscribe to our Newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *