Unveiling AI's Infrastructure Evolution with Outerbound’s CEO


MLOps Weekly Podcast

Unveiling AI's Infrastructure Evolution with Outerbound’s CEO
CEO, Outerbounds


In this episode of the MLOps Weekly Podcast, Featureform CEO Simba Khadder and Outerbounds CEO Ville Tuulos engage in a fascinating conversation about the evolution of ML and AI infrastructure, focusing on the inception and development of Metaflow at Netflix, its impact on machine learning operations, and the establishment of Outerbounds. The discussion delves into the challenges and solutions in ML operations, offering insights into the future of artificial intelligence applications in business and beyond. They also deep dive into the innovative approaches to scaling ML projects, emphasizing practicality and efficiency in the fast-evolving tech landscape.

Listen on Spotify


[00:00:06.130] - Simba Khadder
Hey, everyone. Simba Khadder here, and you are listening to the MLOps Weekly Podcast. Today, I'm speaking with Ville Tuulos. He's the co-founder and CEO of Outerbounds, a company developing modern human-centric ML. He's also the author of Effective Data Science Infrastructure, which is published by Manning. Prior to Outerbounds, he developed infrastructure for machine learning for over two decades. He worked as an ML researcher in academia and as a leader in numerous companies, including Netflix. While he was at Netflix, he led the ML infrastructure team that created Metaflow, a popular open-source framework for data science infrastructure.

[00:00:41.440] - Simba Khadder
Ville, thanks for joining me today.

[00:00:43.580] - Ville Tuulos
Thanks for having me. Great to be here.

[00:00:45.350] - Simba Khadder
I'd love to kick off by just learning more about the story of Metaflow. Maybe you could just take us back to the beginning, like why was Metaflow created? And the story from there.

[00:00:54.660] - Ville Tuulos
Well, our origin story goes back to Netflix. I used to lead ML and AI infrastructure at Netflix. I'm pretty sure that everybody here knows that Netflix has been doing machine learning recommendation systems for a very long time.

[00:01:08.010] - Ville Tuulos
The interesting thing that happened around 2017 time frame was that beyond recommendations, Netflix started to be really interested in applying ML to many kinds of use cases across the company. You can imagine that a company of Netflix's size has potential use cases for ML and AI, really, across the company, throughout the production process and so forth. Anything from natural language processing to computer vision to all kind of business-y data science problems and so forth. Hence, infinite appetite to make the company more sophisticated.

[00:01:39.110] - Ville Tuulos
They had the same problem that many companies have even today, which is that on the one hand, they had a good amount of engineering infrastructure, all kinds of cloud stuff, compute, orchestration, data platform, over a hundred people working on the data platform. Then they had the whole data science machine learning organization. But there was a bit of a gap between the two. The engineering infrastructure that was never really built only for machine learning was really hard to access by these machine learning people.

[00:02:05.070] - Ville Tuulos
I really wish that in Metaflow to bridge that gap, make it easier for the data scientists, the ML engineers to access all the parts of infrastructure that you always need. How do I run my models at scale? How do I orchestrate these systems in production? How do I keep track of everything?

[00:02:19.560] - Ville Tuulos
That happened back in the day, 2018 time frame. Then, Metaflow got really popular inside Netflix, so then we open-sourced it in 2019. Many companies started using Metaflow and reaching out to Netflix and asking for support. Can we run this on Azure? Netflix was only on AWS. Then it became a bit uncomfortable telling everybody that "No, I'm in Netflix. It's not that interested in supporting those features." It really started feeling this machine learning.

[00:02:44.660] - Ville Tuulos
This was actually way before the ChatGPT, obviously. Not so much AI back then. But it was really growing rather fast. Finally, in 2021, we decided that okay, now it's the time to really start helping other companies properly as well. That's when we launched Outerbounds.

[00:02:57.930] - Simba Khadder
Awesome. I want to go back. One thing that I think doesn't get spoken enough about is how different each of the workflows and different ML is. You spoke about NLP, you talked about recommender systems. Sure, there's a lot of fraud, tabular data type things. There's also, surely, in Netflix, computer vision use cases as well. Do you find that all those use cases can be properly satisfied of one MLOps platform? Do you think that we need different verticalized solutions for computer vision versus recommender system versus NLP?

[00:03:33.500] - Ville Tuulos
Yeah. No, I think that's a great observation. It definitely feels, especially these days as ML and AI is a bit like eating the world, that the use case is so diverse that it feels that even putting them under the same umbrella really doesn't do justice. On the other hand, it's like software engineering. You could say the same thing about software engineering that, in a way, they're such a big difference—let's say, doing whatever, like embedded devices for cars versus websites versus mobile apps—that in a sense, talking about software engineering holistic, it doesn't make sense.

[00:04:08.210] - Ville Tuulos
But then on the other hand, there are some commonalities. That's really the way how we have been thinking about things as well, that there are certain foundational needs. If you really, really think that... What is common for all these use cases? Well, it's easy to say that data. Well done. You always need data. That doesn't go anywhere.

[00:04:25.260] - Ville Tuulos
The other thing is that typically you do need some compute. Sometimes you need a little bit of compute. Sometimes you need a crazy amount of compute. But compute is actually really one quite a differentiating feature compared to, let's say, classical software engineering.

[00:04:37.350] - Ville Tuulos
Then the fact that these things always become a part of a larger system. Everybody knows at this point that it's not only the model, but it's everything around the model and the feature pipelines and what not. You need to orchestrate these things. Then the fact that this is such a kind of empirical science. Nobody ever just builds an ML system from a get-go and says that, "Okay, we deployed it once, and now we are done."

[00:04:59.190] - Ville Tuulos
It's always iterative process. It feels that there are these really foundational elements. But of course, super broad. Compute comes in different forms. Data is unstructured, structured, semi-structured, all kinds of things. But at least these high-level concepts seem to be quite common. Depending what's your perspective, everything looks different or in some sense, there are definitely some commonalities.

[00:05:20.270] - Simba Khadder
Can you break down the Metaflow abstraction? What does Metaflow do?

[00:05:24.060] - Ville Tuulos
It really starts from the fact that we think that there are these foundational questions that... How do we access data? How do we do compute? How do we do orchestration? Meaning that we have some kind of a workflow, some kind of a system that we need to orchestrate. And then how do we keep track of a versioning experiment, tracking all that good stuff?

[00:05:40.630] - Ville Tuulos
Now, as many of the listeners here know, there are different solutions and different MLOps solutions, even for each one of these layers. You can use whatever data warehousing for data. Many compute platforms are available. You can do your Kubernetes, whatever. Then there are many nice orchestration system. You can use Astronomer, you can use Airflow, you can use Dagster and so forth.

[00:06:01.770] - Ville Tuulos
Now, the challenge oftentimes is that when you have a five or six different tools, none of which are typically specifically built for ML and AI, it becomes really hard to actually build systems effectively. What Metaflow has been doing since the early days is that we provide a consistent Python API over all these layers, so somewhat a nice way to access data. There's easy way to access compute in different ways.

[00:06:24.910] - Ville Tuulos
Also, there's different ways how you can scale your workloads. You can actually have the API for building the workflows, which is the first order concept in the Metaflow land. That's a bit of a difference to, let's say, how you would build things, starting from a notebook where it's more like, okay, let's build the model. And then you figure out that, wait a minute, now we have a model, how do we build the system? Metaflow has been always much more about that... How do we build the end-to-end working system from the get-go?

[00:06:49.500] - Simba Khadder
Got it. It's almost like trying to tie experimentation. Because one thing that makes ML, like you've already said it, the iterative process. With software engineering, I told you I was throwing away the majority of the stuff I was doing. The bad... You shouldn't do that.

[00:07:04.370] - Simba Khadder
But in ML, it will be completely normal. It's more of a science than it is engineering discipline. It's more iterative. Inherently, that's why notebooks exist in data science. They don't really exist in software engineering because the problem space is so different. I guess is it fair to say that Metaflow attempts to bring those two worlds: a production world, which looks more like engineering; and the experimentation world, which looks more like a science together.

[00:07:31.280] - Ville Tuulos
Yeah. Oftentimes, we think about that, that you have a triangle where you have these three elements. You have a code. By the way, this is an interesting question that there was the Andrej Karpathy blog post a couple of years back about Software 2.0. There's one school of thought that says that, well, do we even need code anymore? Maybe everything becomes one huge LLM. Maybe everything becomes one huge DNN, and there's no code.

[00:07:52.730] - Ville Tuulos
Well, we happen to believe that if you look at any company today, there's plenty of code in all ML and AI systems, so code isn't going anywhere. You have the models and then you have data. You have this holy trinity of code data and models, and you always have to work in this triangle. Making it easy to move between different modalities is super important.

[00:08:10.870] - Simba Khadder
I see what you're saying. This is almost like this... Yeah, the one thing that makes ML different is that, like you said, there is these different artifacts. There's artifact of the model, there's data artifacts, and then there's actually the code itself. And so all these things need to work together.

[00:08:25.540] - Simba Khadder
That also means that there are more things that can break. The data can change, and the features drift, and now my model isn't working. There's not a good equivalent of that in software engineering because software engineering is much more deterministic where ML is a heuristical problem space.

[00:08:42.350] - Simba Khadder
I guess now that we've laid out Metaflow and laid out some of those abstractions, do people use Metaflow for computer vision use cases? Can you walk me through the differences between using Metaflow or Outerbounds for a computer vision use case versus using it for NLP or tabular use case?

[00:08:58.750] - Ville Tuulos
Yeah. It goes back to the fact that we see that something like Metaflow provides really the foundation. We have never envisioned something like Metaflow. It's really like a turnkey solution that you just push a button and outcome is exactly the outcome you want. But in a way, I think the metaphor is a bit overused, but it's like operating system in the sense that it's the foundation upon which then different teams, oftentimes they build their own, not only the technical stuff, but also the human workflows, because there's actually the whole human aspect of building these things.

[00:09:28.380] - Ville Tuulos
Now, using the computer vision as an example that... What we do is that if you need to access large amount of GPUs, that's something that we help you to do. If you need to have an efficient data loader, so you can, let's say, get terabytes of images or videos from S3, we have an optimized S3 client for that. Now, what is exactly your model? Maybe you need to do some pre-processing, whatever it might be. That is then a layer that you have on top of the foundation, and that's the philosophy.

[00:09:55.920] - Ville Tuulos
The same thing then... Let's say that you have a whole another site, you have some very vanilla business data science case, you do forecasting, say. There, the landscape may look very different. Again, you do need data, but the data obviously might live in some kind of a database.

[00:10:10.090] - Ville Tuulos
As you very well know, all these questions about the feature engineering, feature transformations, that's a whole different landscape for structured data. Oftentimes, teams may have their own abstractions. I mean, it's a highly domain-specific thing as well. And then on the other end, the models may be much cheaper to train potentially, so then the focus is shifted a bit to different type of questions.

[00:10:30.080] - Simba Khadder
That makes sense. I think that's what I was trying to see and get at. I think it's almost like the orchestration is orchestration. In both cases, you have an orchestration pipeline, but the pipelines are very different. The parts that you maybe are spending more effort on, your vision is going to be very model-heavy, typically. Very data-heavy, but in a different way because you're not doing feature engineering.

[00:10:49.900] - Simba Khadder
If you're processing in practice, it tends to be pretty minimal compared to business use cases where the feature engineering is everything and the model is almost like a throw, actually boost at it, and call it a day. I guess by focusing on that layer below, the orchestration layer, you allow people to... You're able to service both cases and provide value in both use cases. Is that fair?

[00:11:11.550] - Ville Tuulos
Yeah, that's fair. This is, by the way, really an interesting question to think about and then project that. What will the future look like? Personally, I would like to see a future where we have a very diverse landscape of different kinds of models and applications, different companies really using their expertise to build different product experiences and building very unique models and thinking about data closely.

[00:11:34.660] - Ville Tuulos
But then there's the other point of view that maybe everything becomes just like OpenAI LLM. Maybe that's it, and nobody does anything anymore. We are much more in the camp that it's actually super powerful that you get these foundational tools upon which then you can build your own experiences. But again, if you are more in the camp that I just want to hit an API, I don't want to think about any of that stuff, then again, maybe Metaflow isn't that useful.

[00:11:57.230] - Simba Khadder
One thing, I definitely want to dive into the AI ML. Obviously, everyone's thinking about that. But one last line of questioning I'm curious about is Outerbounds versus Metaflow. What are the differences between the two?

[00:12:08.720] - Ville Tuulos
The usual story there that, as I mentioned, we started at Netflix, and we started in an environment where we were standing on the shoulders of giants in a sense that there were really literally hundreds of people, hundreds of engineers who had been building orchestration systems. Over time, Netflix has built, I think, five different workflow orchestrators in-house. Massive amount of it before it went there.

[00:12:31.030] - Ville Tuulos
They have over a hundred people working on the data platform. They had tens of people working on the compute platform. That's a really major engineering effort. Then on top of this stack, we were able to layer Metaflow. Suddenly, you can actually access all this infrastructure when building ML and AI.

[00:12:46.120] - Ville Tuulos
Now, the challenge is that none of that infrastructure is available outside Netflix, right? We saw many companies struggling with the fact that Metaflow helps you to set up, let's say, infrastructure on AWS or GCP, and so forth. But still, anyone who has worked with Kubernetes knows that it's actually a complex beast. It's not a coincidence that companies like Netflix and Meta and Google and many others have spent a lot of time optimizing these things.

[00:13:09.500] - Ville Tuulos
We decided—based on all the learnings that we had had—that it would be amazing if we can provide this foundational infrastructure as a managed platform, the same type of experience that we had at Netflix. But then if you have this managed platform, you can focus on building your applications, your ML magic on top of the platform without having to worry about, let's say, how do we do gang scheduling on Kubernetes? How do we get data from Snowflake as fast as possible?

[00:13:33.400] - Ville Tuulos
There are a whole bunch of questions related to just getting the security and policies, data governance in a good place. As many of us who are ML AI practitioners know, it's the uncomfortable truth that there's this freedom and responsibility that it would be amazing to have the freedom to do anything.

[00:13:49.360] - Ville Tuulos
But at the same time, in any real business environment, there's the responsibility side of the house. You can't just select, start everything from Snowflake without considering anything. You have to be a bit more careful about that. On the Outerbounds side, we want to help enterprises to make sure that the data scientists can really innovate fast while respecting all those things that need to be respected.

[00:14:08.620] - Simba Khadder
I guess coming back to Metaflow... One thing you've mentioned a few times is how many orchestrators exist. Obviously, Metaflow is different in that it is entirely focused on one specific workflow. This is a machine learning workflow, which is very unique and not typically an afterthought.

[00:14:23.700] - Simba Khadder
You mentioned Airflow. Airflow was built for a much different, much more traditional data engineering type use case and not ML, which is quite different. I guess there are also other MLOps-y orchestrator platforms, even people are fitting together multiple vendors to build their own platform. From your perspective, what makes Metaflow the best or at least very unique?

[00:14:48.400] - Ville Tuulos
I'm very biased, so you should ask other folks as well. But I can say that maybe one of the things that we felt quite strongly from the beginning and felt this has been a big differentiator is that if you go to the website, we have always talked about this idea of human-centric infrastructure, which was the recognition that although ML and AI are a super cool, technical, highly challenging engineering challenges and all that stuff, at the end of the day, it is people. It is oftentimes people with very diverse backgrounds even.

[00:15:21.620] - Ville Tuulos
We work with people with background in economics or even social sciences or biology. These are people who are comfortable, let's say, using Jupyter notebooks and so for forth, but definitely not comfortable even like Bake and Docker images, many things that engineers take for granted. Just push it through CI/CD and bake your Docker image, what's the big deal?

[00:15:38.640] - Ville Tuulos
These are real inferences for people. That's why one guiding idea with Metaflow has been to really, really focus on the developer experience. These days, everybody... I mean, you can't find any tool who says that we don't care about developer experience.

[00:15:51.870] - Ville Tuulos
But then again, details matter. It's just something that has been very deeply in our DNA. Whenever we hear why people choose to use Metaflow, oftentimes that is the deciding factor that besides all the features and of course, many other tools provide features as well. It's really… At the end of the day, it is the kind of data scientist, ML, AI developer experience that really matters.

[00:16:11.550] - Simba Khadder
Can you walk through a case study, maybe a company that started using Outerbounds or Metaflow and its success they found of it, how it looked before and how it looks after?

[00:16:21.490] - Ville Tuulos
One fun example that comes to mind is that there's a company that I guess one could characterize as the Robin Hood of Europe: the Trade Republic. What I love about what they have done with Outerbounds is that obviously, they are a pretty sizable company. They have many use cases. There were some cases where they hadn't used ML in the past, and it was more like a heuristical engineering, just a simple approach.

[00:16:44.170] - Ville Tuulos
Thanks to the fact that now they had the platform available, and it just gave them confidence to start using ML in the use cases where it hadn't been used in the past. They were actually some amazing people on their side who were able to build some solutions that actually impacted the company's bottom line right away.

[00:17:00.130] - Ville Tuulos
You can imagine that nothing motivates the company more than actually seeing that there's more money coming to the company. What's really fun about that is that then it really got the positive feedback loop going that nothing breeds success as success. Then giving that confidence that, actually, we can use ML for more use cases.

[00:17:19.320] - Ville Tuulos
This is actually one of the biggest challenges that we have at many companies. The question is not so much that, well, is Metaflow better than Airflow? We have this thing running on Airflow. Why we should be migrating to Metaflow?

[00:17:29.840] - Ville Tuulos
Oftentimes, don't worry about it. We actually integrate with Airflow if that's what you want to do. The interesting question is more so that, what about the thousand other use cases that you haven't dared to address because you felt that it's too hard?

[00:17:42.430] - Ville Tuulos
This is really the magic of a company like Netflix. What the Trade Republic did is that they realized that they could be applying ML and AI in so many more use cases. When it doesn't take such a long time, when it's not so hard, when it doesn't require a team of 10 people, then you can actually become much more confident that, wait a minute, we can actually do this, and we can do that.

[00:18:01.950] - Ville Tuulos
Some of these things work and some don't. That's a part of the experimentation. But when it works, it can really change the direction of the company.

[00:18:09.590] - Simba Khadder
Yeah, that's a really interesting perspective and not one that I think it's talked about a lot. A lot of the focus on MLOps tends to be very bottom-line focused. It makes you move faster, it makes you more productive. But there's an enablement, a top-line value prop, too, which is that it allows you to apply ML much more cheaply, which means you can just apply in places that before, it's like this is a major project.

[00:18:32.460] - Simba Khadder
Now, it's not. It's almost like in a similar way of CI/CD and some of the other DevOps things, which is like, yeah, it's really cheap and easy now. I just write this code and I just deploy it. Or before it was server set up. I need to figure out what instances I need to handle. There's so much more overhead to really something new.

[00:18:50.530] - Simba Khadder
If you have a proper MLOps platform, just like... Yeah, I have this idea. Let's just try it. If it works, it works. If it doesn't work, then that's fine. At least you've tried something that wouldn't have been tried before.

[00:19:01.650] - Simba Khadder
I want to come back to the ML AI discussion. I guess I'll give you a broad question, which you've touched on already, which is, what does the future look like between ML and AI? Are we moving to a future where it's just ChatGPT runs everything and ML is deprecated? Or do they live together, ChatGPT all hyped?

[00:19:22.170] - Ville Tuulos
Yeah. Obviously, it's fun to see that especially last year, depending who you ask, you get widely different answers. I think it's super exciting time in the industry. There isn't necessarily even a consensus, so I can give my point of view.

[00:19:35.630] - Ville Tuulos
I think... Again, I totally know that some people think about this differently. I think techniques, let's say, focus on the large language models, specifically. I think it's an absolute game changer when it comes to natural language processing. I think that actually there are certain NLP tasks. Let's say you want to be part of speech tagging or something that maybe LLM saw overkill.

[00:19:56.940] - Ville Tuulos
But I mean, safe to say that the game has totally changed when anything that comes to language, and overall, when it comes to unstructured data. That's actually a huge thing because it used to be so that actually most of the data at companies was really messy and unstructured, and nobody dared to even touch it because it was so messy. Now suddenly, thanks to LLMs and of course, the other GenAI techniques in general, suddenly all these piles of messy natural language noisy data are available.

[00:20:24.500] - Ville Tuulos
That being said, then there's actually a huge wad of other use cases. Let's say you look at the fraud detection, or you look at the convex optimization. It's mind-boggling to me to even consider that how would anyone think that... Let's say you're a hedge fund and you're doing some portfolio optimization. What do you do? I mean, go to LLM and you post whatever your stock portfolio and say that, okay, what should I do? That doesn't make any sense.

[00:20:48.090] - Ville Tuulos
Or let's say you want to do convex optimization, or you want to do operations research. Obviously, things like this. Even many things about in the structured data, you want to do forecasting. You don't do that with LLMs. I think it feels quite obvious that we have an amazing new tool in our toolbox, but it's not going to overtake all the other tools. What I find super exciting is that first, we have a whole new set of applications that we can build, thanks to the fact that all the NLP stuff is "easy".

[00:21:17.280] - Ville Tuulos
Then also, we can actually enhance all the existing ML applications by, let's say, that you could take all the unstructured data, and you can make it an embedding, and you slap the embedding in your ML model, and maybe you get lift. That's amazing.

[00:21:29.160] - Ville Tuulos
I think that the future that we are entering is the one where actually we have a very healthy mixture of both. I don't know what's the balance. I wouldn't be surprised if over time there will be more and more traditional ML applications because also now there's so much more interesting applying ML and AI, and people are thinking about data more seriously and so forth. I think that's the super interesting future. I know that there's the other school of thought, which is that, okay, you just ask ChatGPT, and it gives the answer to everything.

[00:21:54.860] - Simba Khadder
Yeah, I'm in your boat, too. Maybe it's because we're of the old-school ML generation where they fit together. It's another tool on our tool set. It reminds you—probably around the time you start Metaflow—where deep learning was all the rage. You just had to say deep learning, and you got the term sheet.

[00:22:15.100] - Simba Khadder
Traditional ML is dead now. You don't even need to do feature engineering anymore. Just deep learning will just figure it out for you. Obviously, we can look back and say that that wasn't true at the time. You can easily point at LLMs being an extension of that same idea of deep learning. It is.

[00:22:32.110] - Simba Khadder
I mean, decoder models aren't brand new. I have a similar take in that. I think that there are certain use cases where you just need the random forest. It's just going to be some form of boosted algorithm. It will just continue to be. It is, I would guess.

[00:22:44.470] - Simba Khadder
But today, it's by far the most deployed type of model in production. I think that will still be true in five years. I do think that there will be a whole new set of problems that were never this enablement. This could never be done before. It was just impossible to even consider doing it. It was either the value compared to the cost just didn't make sense, or it was just so hard to do that you didn't have the in-house expertise to do it.

[00:23:09.950] - Simba Khadder
Now it's very easy to start on ChatGPT. I've already seen people trying to mix them together with traditional ML to create features and feed those features into an LLM to generate an output. Very simple example could be like, I have a traditional model to generate someone's credit score, but then I feed that credit score along with some other features into prompt, and then that prompt gives some form of saving advice. I do think that these things will look together, and I think that's where things will get interesting.

[00:23:36.780] - Simba Khadder
Does Metaflow have an AI story? Do people… Does Metaflow work with LLM, especially applications, today?

[00:23:43.980] - Ville Tuulos
Yeah, it's actually fascinating when you overall think about what do we mean when we talk about AI? Especially last year, there was a lot of commentary. Many people who had never done AI before suddenly became AI experts. There were all fears and uncertainties doubts that, oh, what is this AI and Runway AI, and all that stuff.

[00:24:03.810] - Ville Tuulos
We have been always building the foundations. We have been in the trenches. If you actually think that, okay, what does AI mean today?

[00:24:12.010] - Ville Tuulos
It means that you take, let's say, a model from Huggingface. That's a PyTorch model. Let's say you want to do your own fine-tuning, and that would be a typical case. Let's say even you are really building everything from the ground up. You are OpenAI, you want to do the pre-training by yourself. What do you need?

[00:24:27.180] - Ville Tuulos
You need a big, big bunch of computers. You need these computers work together. You need to have the whole compute substrate. Honestly, I think OpenAI has two secret sources. One is all the engineering jobs that they have developed and then the clusters and so forth, and then the other one is the data.

[00:24:43.280] - Ville Tuulos
That engineering side is what we have been focusing on, building from the bottom up. What Metaflow helps you do today and what we much invested in last year is that, let's say you want to do distributed training because it might be hard to get the GPUs, the A100s, the H100s. You want to use smaller GPUs, but you have a bunch of them. You have 15 of them, and you want to do a LLaMA 2 fine-tuning. How do you actually do it? Not only how you do it once, but how do you do it as a part of your actual AI application? That's something that we help you do.

[00:25:12.500] - Ville Tuulos
Also, it's interesting that the way how you develop these models, there are still many commonalities. Again, it's an iterative process. There's still many concepts of experiment tracking, artifact tracking, all that still applies in this new world. They have very practical questions about like where do you save the checkpoints? We have the efficient S3 client and so forth. We have been very thoughtfully applying the foundational elements that we have and seeing how we can help actual companies actually want to build these AI-driven experiences, use open-source LLMs, and how you can actually then get access to compute cost efficiently and so forth.

[00:25:48.810] - Ville Tuulos
I think that there are very much two kinds of companies. If all you are interested in doing is hitting OpenAI APIs or something of that sort, again, technically, you're not even doing any ML and data science. You're just hitting APIs. By all means, there are easier ways to do that. But again, if you are more on the side that you want to actually build something more unique, then Metaflow can be very helpful.

[00:26:07.450] - Simba Khadder
I think that it will become more common to see more custom-looking use cases because it allows you to tune things, not just literally fine-tuning, but it allows you to control everything. If you have a mission-critical application, you don't want your OpenAI endpoint to return overloaded resources or whatever, which is quite common, or have no control over your response timely and see. But for some use cases, it's totally fine.

[00:26:32.840] - Simba Khadder
I think it's also interesting. One thing I've come to find is that the core problems of ML, my perspective, there's four categories, umbrellas. One umbrella is just data, which I'll just broadly say, like dealing with data, creating features, creating training sets, versioning, artifact management, etc. There's training and model, like just keeping track of your runs, your experiment trackers, like your actual GPU orchestration.

[00:26:58.200] - Simba Khadder
There's deployment, which is like models and production, and then evaluation. There's a lot of companies around each of those verticals, and then obviously, there are companies that try to give the whole MLOps platform above that. For LLMs, it's almost the same. The big problem is one data. It is still, especially with RAG, all fine-tuning, it's still a very core problem, and it looks very, very similar as it does. The training step is now a fine-tuning step. It's optional, but it's still there.

[00:27:23.950] - Simba Khadder
Serving, same thing. If you want to control your model, which I think is going to be more common for people to find their own models, then that's hard. How do you deploy a LLaMA 2 in production at scale? Then finally, as the more obvious one, evaluation. How do you make sure it's doing what you want it do? How do you handle failures? Which is very similar to the ML problem, almost exactly the same.

[00:27:46.580] - Simba Khadder
The data part is very, very similar as well. It's interesting that a lot of it is almost the same, but there are specific things that are very different. Prompts never existed before. Prompt management is a whole new thing that's only specific to AI. On the traditional ML side, you can optimize workflow more because it's much more static, where with LLMs, there's way more variety from just doing an API call to I'm training my own small LLM for a specific task.

[00:28:15.290] - Simba Khadder
It's fascinating to see how much overlap there is between AI and ML. Do you think that MLOps and LLM Ops, if I can use that term, will merge more? Do you view them as two very separate categories? What's your take?

[00:28:30.470] - Ville Tuulos
Yeah, I think that there's the actual reality: what people are doing, what people need, and what people want. And then there's the labeling game. As you remember, even the term MLOps didn't exist. I think it maybe became popular around 2019, 2018, something like that. Just didn't exist.

[00:28:47.350] - Ville Tuulos
It didn't mean that people were not doing the activities. Of course, the activities were very much done. Nobody just called it MLOps. As always with these labels, nobody ever defined that, okay, what are exactly the boundaries of MLOps? It's like something of a fuzzy thing that, okay, something MLOps-y.

[00:29:02.450] - Ville Tuulos
Now, very much the same thing with the LLM Ops. Again, some kind of a fuzzy cloud overlaps with the MLOps partly. Nobody knows. I think the important thing is that what are the activities behind the labels?

[00:29:15.000] - Ville Tuulos
Exactly to your point, there are many, many similar activities, regardless whether you are building an LLM-powered application or traditional ML-powered application. You can call it whatever you want. You can call it DevOps, you can call it MLOps, you can call it LLM Ops. I'll come up with a new term, that's fine. But I think that the activities matter.

[00:29:31.600] - Ville Tuulos
I think it's really important to recognize that things do change. It's not that everything is always the same. It's the same thing as people were struggling for the longest time to understand that why is ML and AI development different than traditional software engineering?

[00:29:45.600] - Ville Tuulos
I think even today at many companies, there are this almost internal friction that like, wait a minute, we have software engineers who can use these workflows quite fine. They just deploy their microservices on Kubernetes. You are ML people. Why don't you do the same thing? It's just a fall in line. Let's make life easier. Just do exactly the same thing.

[00:30:04.360] - Ville Tuulos
But then it feels that, well, this doesn't feel quite right. I would say that it's maybe the same thing now with traditional ML and LLMs that you could say that, okay, LLMs such as ML do exactly the same thing. But I would argue that, yes, people will say that, "No, it doesn't quite feel right. There are some important differences."

[00:30:20.670] - Ville Tuulos
At the same time, there are absolute things that we should learn from the MLOps side of the house. That's why we really try to look from the bottom up, that at least starting from this very basic needs, data computer orchestration. These are the universals, but then the nuanced differences, like how you actually make this happen for different use cases.

[00:30:38.780] - Simba Khadder
Yeah, I think that's spot on. I think that's a pragmatic answer, which is just that the problem spaces definitely have overlapped, but there are nuances in each of them. We'll see, because I don't think there is a clear LLM workflow yet. There is no gold standard. I don't think there's really a gold standard in MLOps, to be honest, yet. I don't think there's a company we can point at.

[00:30:58.520] - Simba Khadder
DevOps, there was. You could point at Google or Netflix and be like, yeah, that is the gold standard of DevOps. There are obviously amazing examples of MLOps, but I've never felt like, oh, yeah, Google does it best. It just never really felt that way.

[00:31:13.670] - Simba Khadder
I guess the same thing is true of LLMs. I don't think there even exists… The only enterprise-grade LLM application, in my opinion, is Copilot. You have Copilot. I think everything else is more or less an experimental beta type product. I think it'll be really interesting to see how these things unfold.

[00:31:30.440] - Simba Khadder
Awesome. This has been great. Thank you so much for making the time talking with me about all these diverse topics. We'll include some links in the description for people who want to follow up with you and learn more. Thank you again.

[00:31:42.740] - Ville Tuulos
Thanks for having me.

Related Listening

From overviews to niche applications and everything in between, explore current discussion and commentary on feature management.

explore our resources

Ready to get started?

See what a virtual feature store means for your organization.