Howdy folks! My name is Mikiko Bazeley.
Some of you might recognize me from LinkedIn or Youtube and already know my story but for those who are new, here are the main bullet points:
One of the most common questions I get every time I’ve pivoted in my career is “How did you know you wanted to build a career in X?”.
The most recent variation has been “How did you and others working in production ML know you wanted to become MLOps Engineers?” .
"Success is stumbling from failure to failure with no loss of enthusiasm"– Winston Churchill
Before I started working on production machine learning as an MLOps Engineer, I was a struggling data scientist.
And before I was a struggling data scientist I was an overwhelmed analyst.
And even before that, I was a completely confused and lost growth hacker.
As an undergrad I initially attended UCSD with a rather vague idea of eventually going to medical school, taking classes in public health for fun while taking organic chemistry. However, I decided I wanted to understand humans at a more macro scale, like how we make decisions and codify practices into culture and spent my remaining undergraduate years studying biological anthropology and microeconomics.
Without realizing it, I would be engaging the practice of ethnography (studied during my time as an anthropology student) through intensive hands-on experiential learning as a data and machine learning practitioner. Every stage of my career followed a pattern of encountering a new environment, observing the pain-points of users in that environment, attempting to solve the pain-point, realizing the obvious solution was incomplete, thereby triggering inquiry into solving the “solution”.
After graduation I moved back to San Francisco to find a job where I could save a bit of money while figuring out my next steps. While working as a receptionist at a small hair salon I started to understand the challenges small-medium sized businesses faced, especially when it came to incorporating new technologies that would supposedly increase their revenue, such as CRMs and Point-of-Sale systems. I started to become curious about the power of leveraging data to help grow cash-strapped brick-and-mortar businesses.
The next chapter of my career after the salon was an introduction into DataTM, big and small, structured and unstructured. I would go on to work as a data analyst for an anti-piracy company, as a financial analyst at the largest residential solar companies in the US (working on supply chain forecasting & sales modeling), and then working as a hybrid data analyst/data scientist for the customer success team focused on BIM 360.
These companies were as different as could be (size, industry and data maturity) and yet I ran into similar categories of problems, like data access, navigating tribal knowledge and metadata about the data, and ensuring the insights and strategic recommendations I provided my key stakeholders (many of whom were the CXO’s & VP’s of their respective companies & organizations) were timely, consistent, & trustworthy.
Many of these challenges were magnified in the next chapter of my career, as a data scientist and then MLOps engineer. In growth and analytics, my primary responsibilities were to use data to enable visibility into the health of the business & assist in decision making, as well as provide diagnosis when needed.
As a data scientist I was now responsible for developing external-facing, predictive models and answerable to many key stakeholders when code broke. Instead of data flowing in a single direction (from source through transformation to consumer) data needed to flow in multiple directions, like a linked daisy chain of multi-armed Lovecraftian horror monsters.
Some of the biggest challenges I faced as a data scientist working at a digital adoption SaaS platform and a health devices company included the lack of engineering support, the difficulty in setting up and stitching tooling together, and coordinating the many moving pieces of a machine learning pipeline as a data scientist on an island.
Deploying models is hard, especially when you don’t know what “good” looks like.
It wasn’t until I joined Mailchimp as an MLOps Engineer that I began to see my experiences and hard-earned battle scars working on data and machine learning systems coming together as a career in and of itself. After joining Mailchimp and having the opportunity to be a part of a functional and effective organization successfully deploying machine learning features, I was inspired to dive even deeper into the world of MLOps.
“To achieve great things, two things are needed: a plan and not quite enough time” -Leonard Bernstein.
Although MLOps and production ML best practices and tools are still being developed, I’m particularly intrigued by the following opportunities and trends:
My favorite question to ask experienced engineers is what they think are the three most important innovations to have occurred in the last 10 years. The answers usually center around open-source (as well as cloud platforms and general ecosystem maturity in certain languages like Python).
Some of the impactful open-source projects include Linux, Git, VSCode, Eclipse, Firefox, Tensorflow, PyTorch, etc. These projects have pushed technological innovation forward while relying on the (in many cases unpaid) collective efforts of thousands of individuals. The most important impact of open-source has been the acceleration of individuals, teams, and organizations throughout time and space.
For example, open-source projects have allowed me to bootstrap my data science and machine learning career, as well as help companies like Mailchimp, Teladoc and Sunrun in leveraging data science to power the economic prospect of thousands of businesses, improve health outcomes for individuals with chronic health conditions, and provide green energy alternatives using solar power.
And open-source is certainly not slowing down, especially with the adoption of open-source MLOps tools by companies wanting to open their stacks in pursuit of even faster innovation.
How does a platform or system change based on size, industry, and resources?
What does a functional and performant MLOps stack actually look like for a self-driving automobile enterprise?
Or even a solopreneur creating a small web product with potential?
A question I’ve been thinking about is “What does a mature MLOps system actually look like?”. The common advice is “don’t build for Google scale”. But is that advice really good enough?
Personally I don’t think so.
Everyone wants to be their best and telling teams and organizations to not build Google-scale systems is like telling my average female height and build self to not aim for the Olympics or the figure of a yoked, 5’8” 220lb bodybuilder.
Yes, we get it. I get it. So what should any of us be aiming for?
As James Clear talks about in Atomic Habits and in much of his content, there’s a difference between goals and systems.
For example a bodybuilder might have the goal of reaching a certain body fat percentage and muscle mass by a certain date (for example, 10%-12% BF within 12 weeks for a female bodybuilder for competition time or 4-6% for a male bodybuilder). The system is the diet, workout, & recovery regime that ensures the bodybuilder will reach their target within the specified time.
Likewise for an organization focused on deriving value from their ML systems, the goal might be to increase data science and machine learning output by halving the time to POC as well as time to deployment, while keeping operational and ad-hoc tickets at a consistent volume. The system in this case is the MLOps toolchain as well as the practices and structure for its continued development and maintenance.
However every organization ultimately has different constraints, some of which are immutable. I am never going to be 6’0” and my tall-as-an-oak spotter is never going to be naturally closer to 5’0” than me. Why shouldn’t our systems be different and optimized for who we are and where we’re going?
Regardless of how much we grumble about the difficulties of supporting ML in production, the reality is it’s never been faster or easier to deploy really powerful (and at times scary powerful) applications or pipelines.
During my time as a Data Scientist at Autodesk, for example, Andrew Ng had just announced a new company called DeepLearning.ai. And within a short span of time later, the famous “Obama Deepfake” video from Key & Peele aired. That same year, DeepLearning.ai would release a new five course series consolidating advancements in CNNs, RNNs, and LSTMs with practical applications of deep learning models in machine learning pipelines. At Autodesk’s 2017 annual customer and product showcase (Autodesk University 2017) the big vision was around enabling generative design, AR/VR, robotics, and additive manufacturing for customers.
Five years later, so many of the research initiatives I saw listed under the office of the CTO at Autodesk are now a reality. In the past couple months alone we’ve seen a release of powerful text-to-image generation models as well as Facebook’s Make-A-Video model.
And competition has only increased over the years, as data scientists and machine learning engineers have gone from the research group eccentrics huddling in the corner to essential drivers of profit maximization. Between 2013 and 2019, for example, job postings for Data Scientists on Indeed increased by 256%.
The widespread adoption and democratization of access to data science and machine learning tools (as well as domain knowledge) heralds a new era where in the competitive marketplace of ideas, the winners will be the ideas that go-to-market the fastest and most reliably.
Some reasonable questions to ask are: Which teams and organizations are going to deliver the fastest? Who should MLOps open-source projects be focused on?
The most successful projects will focus on enabling data scientists as the primary users and owners of tools to enable production machine learning.
The current landscape of MLOps tools leaves much to be desired. Not because they aren’t powerful in solving specific pain-points, but because they assume a high level of infrastructure and software development experience to operate.
Data scientists come from a wide variety of backgrounds and are often tasked with a wide variety of responsibilities that run the gamut from interfacing with product and legal, developing POCs, model training and evaluation, pipeline productionisation, and even deployment in many cases.
Yet given the wide scope of responsibilities and the diversity of skills they bring to the table, they are often treated antagonistically as burdens on platform teams or saps on company resources.
Some maladaptive practices I’ve observed include:
A possible solution (although not the only one) is to design projects and tools that support strong engineering practices (as well as company initiatives around data governance and access) and are inclusive of users of all types of skills and experiences. Instead of seeing your data scientists as bottlenecks and enemies of best practices, empower your users with the tools and knowledge needed to produce quality machine learning applications and pipelines.
In other words: Make it easy for people to do the right thing.
I’m excited for the opportunity to partner with Featureform to address the opportunities I talked about earlier because:
When I first started working on production ML and then MLOps I didn’t have a very positive opinion on feature stores (or most standalone point solutions that required significant lift to implement) for many of the reasons Simba talks about in his blog post on the different varieties of feature stores.
Feature stores seemed like an unnecessarily complicated component of an MLOps platform, especially given the preponderance of powerful data warehouses already available.
However as a data scientist and MLOps Engineer, I’ve experienced all the following pain-points in developing and deploying machine learning projects:
When a feature store is done right, it can:
The right abstractions are crucial to actualizing creative and technical potential.
As an experienced practitioner and mentor I’ve been blessed with the opportunity to interact with people from a wide variety of backgrounds and environments, representing a broad range of concerns. And in the past couple years the lack of tooling has only come up with regards to the 1% of companies doing incredibly specialized, cutting-edge research and development.
In fact I’ve heard and seen quite the opposite, that we have too many options for teams and organizations to feel confident in choosing a tool. As we’ve seen with jam, more options isn’t necessarily better – in fact, when individuals are flooded with options, they are not only less likely to switch to a new option but are more dissatisfied with their choice than if they’d been presented with a more curated selection.
So how does Featureform help, rather than add, to the chaos?
While a good portion of this post has been about my Very Big IdeasTM about the future of MLOps and waxing lyrical about the potential of Featureform, the heart of every decision I make comes down to a crucial element:
What struck me the first time I met Simba and Shab was how grounded and fun the conversations were and the values we shared, not just as active members of the MLOps community but also as individuals.
Much like Shab talked about in “Why I Joined Featureform”, my goal at every point in time has been to do my best work and to constantly challenge myself with scary growth opportunities.
And just as Simba talks about in “Lessons Learned: From Google to Building a Profitable Startup”, I’ve never been content with anything less than full ownership and autonomy of my work.
Although I wasn’t actively looking for a new opportunity and had both support and intention to focus on the next promotion into management at Mailchimp Intuit, the timing could not have been more ideal.
I can’t wait to join Simba, Shab and the rest of the Featureform team in building out an integral component of MLOps stacks for the rest of us and giving back to the OSS ecosystem.
From overviews to niche applications and everything in between, explore current discussion and commentary on feature management.