• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
Analytic Strategy Partners

Analytic Strategy Partners

Improve your analytic operations and refine your analytic strategy

  • Home
  • Blog
  • Books
  • About
  • Services
  • Contact Us

common model infrastructure

Deploying Analytic Models

November 4, 2019 by Robert Grossman

One of my defining experiences in analytics occurred over twenty years ago in 1998. At that time, it was very challenging to build analytic models if the data did not fit into the memory of a single computer. This was the time that clusters of distributed computers (called Beowulf clusters then) were still new and only used in academia, ensembles of models were still largely unknown, and when Python and R were still only used by a few fringe communities. I was working at a venture backed start up that developed software for building trees and other common analytic models over clusters of workstations by using ensembles of models. One of our first customers gave us clean, nicely labeled data just when they said they would (in over twenty years this has only happened a handful of times). We built a model that outperformed what was used in production and I (very) naively thought we were done. Little did I understand just how much work it would be to get the model into production and how much it would have to change to get it there. The figure above gives the general idea, but in practice getting the data and deploying the model is even harder than the figure indicates.

Most analytic and machine learning conferences even today focus on new algorithms and new software but very little on deploying analytic models. An exception is the workshop called “Common Model Infrastructure” which was held at KDD 2018 and ICDM 2019. The workshop describes its focus as “infrastructure for model lifecycle management—to support discovery, sharing, reuse, and reproducibility of machine learning, data mining, and data analytics models.”

I spoke at the CMI 2018 workshop at KDD 2018. My slides are on SlideShare and you can find them here. I singled out the following best practices:

Five Best Practices When Deploying Models

  1. Mature analytic organizations have an environment to automate testing and deployment of analytic models.
  2. Don’t think just about deploying analytic models, but make sure that you have a process for deploying analytic workflows.
  3. Focus not just on reducing Type 1 and Type 2 errors, but also data input errors, data quality errors, software errors, systems errors and human errors. People only remember that model didn’t work, not whose fault it was.
  4. Track value obtained by the deployed analytic model, even if it is not your explicit responsibility.
  5. It is often easier to increase the value of deployed model by improving the pre- and post- processing vs chasing smaller improvements in the model’s lift curve.

During the talk, I identified five common approaches for deploying analytic models. I remember these with the acronym E3RW:

  1. Embed analytics in databases
  2. Export models and deploy them by importing into scoring engines
  3. Encapsulate models using containers or virtual machines
  4. Read a table of values that provide the parameters of the model
  5. Wrap the code, workflow, or analytic system, and, perhaps, create a service

Although these days, with the popularity of continuous integration (CI) and continuous deployment (CD), encapsulating models in containers and using CI/CD tools is the approach du jour, each approach has its place.

I also identified five common mistakes when deploying analytic models.

Five Common Mistakes When Deploying Models

  1. Not understanding all the subtle differences between the supplied run time data used to train the model and the actual run time data the model sees.
  2. Thinking that the features are fixed and all that you will need to do is update the parameters.
  3. Thinking the model is done and not realizing how much work is required to keep up to date all the the pre- and post-processing required.
  4. Not checking in production to see if the inputs to the models drift slowly over time.
  5. Not checking that the model will keep running despite missing values, garbage values, etc. (even values that should never be missing in first place).

Copyright 2019 Robert L. Grossman

Filed Under: Uncategorized Tagged With: analytic operations, common model infrastructure, deploying analytic models, E3RW

Primary Sidebar

Recent Posts

  • Developing an AI Strategy: Four Points of View
  • Ten Books to Motivate and Jump-Start Your AI Strategy
  • A Rubric for Evaluating New Projects that Produce Data
  • How Does No-Code Impact Your Analytic Strategy?
  • The Different Varieties of Advisors & the Difference it Makes

Recent Comments

    Archives

    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • June 2019
    • May 2019
    • September 2018

    Categories

    • Uncategorized

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    Copyright © 2025