Building the Future of ML Platforms with Ketan Umare

May 30, 2024

Episode

MLOps Weekly Podcast

CEO, Union

Description

Join Featureform’s Founder and CEO, Simba Khadder, and Union CEO and co-founder, Ketan Umare, as they delve into Ketan’s journey, starting with leading the ETA models team at Lyft, the origins and evolution of Flyte, an ML workflow platform, and his latest venture, Union. The discussion also covers the importance of collaboration in AI, the future of traditional machine learning in the era of LLMs, and the potential disruptions in the software industry. Whether you're a data scientist, engineer, or AI enthusiast, this episode offers valuable perspectives on building scalable ML infrastructures and navigating the rapidly changing landscape of artificial intelligence.
‍

Listen on Spotify

Transcript

[00:00:06.880] - Simba Khadder
Hey everyone. Simba Khadder here and you are listening to the MLOps Weekly podcast. Today I'm speaking with Ketan, who's the CEO and Co-founder at Union.ai. Previously he had multiple senior roles at Lyft, Oracle and Amazon, raising from cloud, distributed storage, mapping, and machine learning systems. He's passionate about building software that makes engineers lives easier and provides simplified access to large-scale systems. Besides software, he's a proud father, husband, and enjoys traveling and outdoor activities. Really excited to speak with him today.

[00:00:39.360] - Simba Khadder
Ketan, great to have you here today.

[00:00:41.090] - Ketan Umare
Hey, Simba, nice to be here as well. Thank you for having me.

[00:00:44.610] - Simba Khadder
Obviously, a lot we want to talk about today. A lot of people know you from your work of Flyte. I think it'd be great to maybe kick off by having you talk about the creation of Flyte. Why was Flyte created? What was the journey?

[00:00:57.190] - Ketan Umare
Yeah, probably understanding creation of Flyte need a little bit of quick background on me. My name is Ketan. I'm the CEO and Co-founder at Union. I've engineered for 20-plus years now, two decades. Look back and you're like, "Man, that's a lot." I've been fortunate to work at banking, high-frequency trading, logistics, built cloud from the ground up and across multiple different companies, Amazon, Oracle, et cetera.

[00:01:20.870] - Ketan Umare
I ended up at Lyft in about 2016. I was the first few people with the Seattle office, and within that, I ended up starting to lead like a ETA models team, which is responsible to build models. It was on ETA model, it was an ETA prediction team. But we were trying to optimize the number that you see on the Lyft app, which is three minutes to get a ride, $25 for your destination. Two numbers. Many people would ask, "Is three minutes accurate?" The question is, it's accurate for you, right? It's the right number that you should see.

[00:01:51.620] - Ketan Umare
But what that means is there is a general way that the number is computed. It's not purely distance-based. It's a bunch of different factors that come into play, like traffic and what side of the road you're standing on, perception. How likely are you to convert? Like, just all of these things have to go into one number. We started using machine learning to solve many of these problems and improve conversion and other metrics.

[00:02:18.010] - Ketan Umare
Probably in the modern sense, this was my first tryst with machine learning. While I was at high-frequency trading and so on, I was doing some quant stuff, and I did have some machine learning work, but I was not really involved. Whenever that Amazon doing logistics used to do operations optimizations, for example, having salesperson or graph routing algorithms and improved routing. We didn't call it machine learning, it was just software, but in this case it was purely, there was a bunch of researchers and data scientists and software engineers all working together, and we were trying to deliver models. That's what led us to in 2016. We were just doing whatever random stuff that came out of it.

[00:02:59.470] - Ketan Umare
People are going to build their models on their laptop, deploy them. A person leaves, the model's lost with them. We had to reproduce it. There's a lot of tribal stuff, almost unscalable stuff I would say was happening.

[00:03:11.760] - Ketan Umare
We were tied to OKR, so we had to do this consistently. We sat down, wrote some... It's like there's a way to solve these problems. We saw that there were multiple steps, like, let's do these things called as pipelines and so on. We got something basic running with Airflow, was able to really, with blood, sweat and tears, a man behind the curtain get something running, which is good enough, right? We were able to prove that there is this operationalization needed and a solution happening.

[00:03:37.280] - Ketan Umare
But at the end of it, we presented it to the company in an all-hands meeting, and we found that a bunch of different teams were struggling with the same stuff, and they're like, "Hey, why don't you platformize this and I will use it."

[00:03:48.060] - Ketan Umare
I'm like, "This is not platformizable. This is just some hacky code that we wrote over a weekend."

[00:03:53.490] - Ketan Umare
That led me to write a big paper of requirements, you would say, like about 15 pages long. These are the things we need. This is what it is. This is how the things are evolving to be solved. If you build it, I will use it because I didn't want to build infrastructure. I'd done a lot of infrastructure prior to that. I was like, "I'm here to solve my problems. I'm going to do that." Probably about 70 people, 80 people in the company, engineers. Nobody had the time to build anything yet. That's what led us to like, "Okay, we'll take that challenge on and we'll build something."

[00:04:20.280] - Ketan Umare
We built a first version of Flyte, which is built on top of AWS Step Functions and Match and ECS and all that and we delivered it. In about six months, we had about 15 critical teams using it in production, and we were like, "Oh, what happened here? Why did they use it?" We started talking to everybody. That's what made me realize, like, again, it's not a point in time where we had a realization. It's like a period of experiments and different kinds of deliveries and sustained outcomes that the people received, which led us to realization that there is something more.

[00:04:51.730] - Ketan Umare
We also got called out by different companies and we shared our learnings, and we started seeing that they were also interested in this, which we basically told her that, "Why don't you open-source it?" Like a bunch of companies told us to open-source. That was when we were like, "Hey, to open-source, we have to actually build the infrastructure in a sourceable way."

[00:05:10.280] - Ketan Umare
We were already at a point where our skunk-works project was not really scaling, so we were like, "Okay, we have learned a lot. Let's redo and rebuild this again from scratch." Keeping one jet engine going and rebuilding another engine. That's what it was, and maybe one of the reasons why it was called Flyte. That's what led to the creation of Flyte as we know it. We open-source it a couple years after, two-and-a-half, three years after it was built.

[00:05:37.510] - Ketan Umare
The prime realization why we built it is that we found maybe two or three things. One thing was, I actually looked back and I said, "Why am I doing this? Why am I writing machine learning pipelines and models?" And so on, when I was under ETA, and I was like, "I have enough of an infrastructure background. I could be working anywhere, building infrastructure and so on, like databases. But this has a higher impact value and a higher product value that I can deliver." Which what felt to me is that if I could be doing this, I thought that this is going to be the direction that many other people might do, and I saw many, many people within the company just building ML pipelines, but this is the direction we're going in.

[00:06:17.440] - Ketan Umare
So this is going to be a wave, and there needs to be a solution that actually makes people like software engineers and data engineers and so on, deliver AI products consistent. That was one. So it was like a realization that there's a change happening.

[00:06:30.180] - Ketan Umare
The second thing was, there is a difference in building. The way we build software and the way we build AI products was different. The entire process was different. Example of that, the way I characterize it, is that a database is like a pure software product. You build it, Postgres is like 40 years old or 35 years old or something like that. It still works great and it actually keeps on improving like refined wine now. The reason for that is because it was based on assumptions, and as hardware and disks and network and computers got faster, it got better. So environmental change has actually improved it weirdly.

[00:07:03.930] - Ketan Umare
On the other hand, AI products are inverse. They get worse because of infusion. That was the second observation, that the way we have all of this tooling is to build software product. How can you apply that to a product that's dynamic, constantly changing, evolving? You have to restart with assumptions, new assumptions, often.

[00:07:22.260] - Ketan Umare
Oftentimes, a model like ChatGPT, even in production, is not going to be the best one. Year later, probably the worst model till you need a new model. You constantly need to reinnovate.

[00:07:30.520] - Ketan Umare
The third one was biggest problem in AI is organizational silos. People are unable to work together. Data engineers, ML engineers and software engineers all need to work together to deliver a product because these are complex living products that are delivered to customers. So we said, "Okay, if these three things are coming, there needs to be a new platform to be built." That's how we started building machine learning platform at Lyft. One of the core components of that was Flyte, which got open-sourced.

[00:07:55.660] - Simba Khadder
I would love to start with the Airflow problem. Many orchestrators exist. There's always another orchestrator problem to solve the other orchestrators. But obviously, I don't think that's the case. Here, I don't think it's the case in a lot of situations. But for people who maybe haven't hit the problem yet, or maybe are trying to diagnose what that problem looks like, why is Airflow not the answer? What is Airflow missing?

[00:08:19.120] - Ketan Umare
It's a fantastic question. I'm actually writing a blog. Hopefully, it'll come out by the time this thing goes out. It's a big blog. I can't cover everything, but I'll give you a few reasons, and some of them are as I covered.

[00:08:31.330] - Ketan Umare
I think if you go to the first principles, you will see that I highlighted three prime changes that were happening in the landscape. We can give high-level terms for them, like collaboration, AI productization and software best practices within AI product. Three different potential aspects. While if you apply that to the existing ETL landscape, that's what I put Airflow as.

[00:08:57.390] - Ketan Umare
The way I characterize is like ETL is essentially needed to build fact dimension tables and potentially drive an analytics dashboard. The absence of that dashboard, the failure to build one someday is not critical. It's not mission-critical. Yes, it is not a great thing, but it's not mission-critical. The failure to not build a model and deploy it, if that is a requirement for the end user product, is actually mission-critical. It will have adverse effects.

[00:09:29.330] - Ketan Umare
An example for this is we used to use real-time traffic analysis, which we deploy to production almost every 10 minutes. One of them fails, we don't have real-time traffic. Let's say there was a huge traffic spike, and now there is no traffic. We have now have this tail model in production that's using the last 10 minutes worth of traffic information. What it's leading to is diverting all the cars in a wrong direction, so it leads to an unoptimal state. It is bad because we actually lose customers, because the ETAs look like seven minutes and eight minutes instead of three minutes, when actually they were three minutes. This is a huge challenge. This has a customer impact, and the moment there is a customer impact, the ground principles on which the product is built does not work. So that's one.

[00:10:13.070] - Ketan Umare
The second one, as I said, is the aggregation of three different types of people, so very research-oriented, fast-moving, experimentation-driven system. If you think about Airflow, like when we were doing it, you have to deploy. We were running one of the biggest clusters of Airflow at that time. You could deploy, and all the workers would go. Let's say you deploy a new version of the pipeline. The existing pipeline would die because any of the working tasks would die, and then you would start anywhere. This sounds like a simple problem, but this stems from the basic construct.

[00:10:44.720] - Ketan Umare
An example for that was one of our engineers imported XGBoost, and another one wanted to use scikit-learn, maybe some version, and there was a version conflict, and all the dags spilled. And I'm like, there is a huge conflict that's happening because of the design. It's based on, like the simple Pythonic import based system the scheduler loads everything into. Maybe it doesn't.

[00:11:05.360] - Ketan Umare
Another thing was, as we were experimenting, how do I experiment? I cannot deploy to production, I cannot modify the existing. I need a version, I need git-like semantics to work with. It seems like, "Oh, I will hack it on to Airflow," and so on. It doesn't work. You cannot. You need the idea of portable dags that you can ship easily.

[00:11:23.650] - Ketan Umare
One other example is we wanted everything to be runnable locally. We just want people to run locally. You start experimenting in cloud, it's too expensive. We used to spend tens of millions of dollars on Flyte at Lyft, and so any dollar we could save would be effective. There have been now many, many posts by many different users of Flyte who are showing that they are saving 50% on infrastructure costs just by using Flyte because Flyte allows you to run everything locally, and then you ship it, ship from local to remote, and it exactly works. It's pretty amazing as a power.

[00:11:54.990] - Ketan Umare
Then we wanted things like memorization. You probably know this already. When you are experimenting, you have 15 steps. 11 steps don't change. And then the twelve-step changes all the time. Maybe that's the training loop, right? Or you want to change a few hyperparameters. So you want the 11 to just reuse now? Yes, you could achieve all this by breaking apart, modify, doing. It's a pain. One of the goals is to reduce the pain and reduce the friction. So again, we have to go to the first principles to actually understand how to do these kind of things, and it's not easy.

[00:12:24.240] - Ketan Umare
It seems like a lot of people are tacking on caching or memorization, but memorization in languages depends on the type system and the knowledge that this data is stored in heap versus on stack, and so you can actually optimize certain things. How do you do that in a distributed system? You had to really go from the ground up to build that thing correctly, because we have seen a lot of horrendous cases where it would get it wrong and you wouldn't even know. It trained the model on the wrong data.

[00:12:52.770] - Ketan Umare
You wouldn't even know because it used a wrong cache output. It's a correctness issue, and I highly recommend people to please understand when you're using caching with systems that are not built with these things. It is not the same as building ground up for solving a problem, and so many, many, many such problems.

[00:13:09.300] - Ketan Umare
But the core context of the second part was that it is essentially a tight iteration loop in the beginning for researchers that you need to make it possible, like literally writing code, shipping to scale should be 1 second, and then from there, instant productionization and having a stable version in production while I'm experimenting with you. This is a critical requirement.

[00:13:32.590] - Ketan Umare
Then the final part is, how do you collaborate across data engineering, ML engineering, and data scientists and software engineers? And they all have their own requirements. Some of them have different languages, some of them are different frameworks, all that needs to work. And so these are the reasons why we did not want to build infrastructure in 2016, but we just fundamentally did not see that path. We actually then went to Step Functions, saying that, "Hey, this is just a pure orchestrator."

[00:13:53.420] - Ketan Umare
We found that infrastructure is a massive challenge. If you think about it today, if you're a data scientist researcher, you start with like, "I'll use CPU." And then you're like, "Oh, hold on, I need a deep learning model. Let me use GPU. Hold on, I can't use B-100, I want to use A-100. So not A-100, I need X-100." You're just scaling.

[00:14:09.450] - Ketan Umare
Similarly, I want to use Dask or Spark or Ray for some tasks, but I don't need to use it for everything. I just want to fan out to massive number of machines to just run a batch prediction. It's crazy to me how many times we're using Union or Flyte today to run 50,000 to 80,000 containers in parallel, and actually, the cost of running this, because we do partial state saves, is extremely low. This is extremely powerful semantics. And so these were built because we saw how backtesting, how evaluation, how hyperparameter activity, how multiple parallel execution is a requirement of the system. That's why we built Flyte.

[00:14:48.610] - Simba Khadder
That's awesome. I feel like I asked that question a lot. Like, how does this compare Airflow? Airflow competes with everything. It's a very generic tool. I mean, just like at the lowest level, everything is some form of orchestrator.

[00:15:00.890] - Ketan Umare
It's a great hackable tool. Take Python, hack it up and do something. "Oh, I got something running." It doesn't scale. If you're building a successful organization, eventually you have to make it work consistent, which is a problem.

[00:15:12.710] - Simba Khadder
I think what you touched on is the workflow problem, which I think a lot of people miss. I think a lot of people like to think about scaling and size of data, and obviously, those have our own problems, but almost orthogonally is ML is a very iterative process. There's this experimentation step, which as software engineers, it's not understood because there's not an equivalent step. In software engineering, you don't really, "I'm going to just go down this path for a week or two weeks, and then if it doesn't work, I'm just going to throw it away." You just don't have that workflow. I like to explain it by saying it's software engineering and data science, and the science versus engineering is really, really important because it is a completely different way of thinking.

[00:15:54.670] - Ketan Umare
That's what I say the difference between database and an AI product. Software products are incremental. AI products are almost transformative. Massive potential, but become zero at some point, and then you have to restart the process. All executives need to understand that. AI is not a one-time investment. It's never going to be. It's going to be a continual investment that you have to continuously evolve, actually more so than even software, which is pretty interesting to me.

[00:16:18.540] - Simba Khadder
Because it's so transformative, I think also consumers behavior and expectations change. Once everything is injected of GPT or LLMs, what people expect is going to be dramatically different, not dissimilar to you not having a website would be like, "How can you not have a website?" That's just an expected thing. I think the same thing is true of AI. Soon, if you aren't smart, like if your product or application isn't smart, it's going to feel like pulling up.

[00:16:45.850] - Ketan Umare
Yeah, exactly.

[00:16:46.530] - Simba Khadder
It's like, "Oh, like I'm pulling up my old school '90s, whatever, like software."

[00:16:51.500] - Ketan Umare
It's a great analogy. I'm like, just thought about an example. It's like, you remember those IBM terminal apps that people used to run all the time? Forgot that term, sorry. Then you go to a web-based app which is like fast and responsive and easy and globally available. It's like that. Once you go this way, you can't go back.

[00:17:10.730] - Simba Khadder
Totally.

[00:17:11.000] - Ketan Umare
Most legacy companies need to start thinking about, and they are. I think OpenAI, whether they reach AGI or no, has done a fantastic job in the world to show a different path.

[00:17:20.980] - Simba Khadder
Yeah, I totally agree. I think unlike past waves that have come through, this one is so obvious. I think that no one's getting caught sleeping. At least I'm not really seeing anyone caught sleeping. The actual speed you can move and get things production-ready, especially if you're an enterprise, is going to be different depending what you're doing. There's a lot of prototypes that are getting out there, that's fine, but they're prototypes, very clearly. It's quite difficult to get an actual critical path kind of LM application in because in a lot of situations, it's not well understood. So unless you frame as this is just a prototype, it's just for fun, it doesn't really matter. It's really hard to get things through.

[00:17:59.060] - Ketan Umare
Oh, trust me, people don't frame it that way because you know what? The challenge over here is again, I'm a software engineer so I'm going to probably do this and the clan is going to hate me. But I think us software engineers think that we extrapolate our past and put it in here, and we're like, "This prototype worked, unit tests work, that means it's going to work. Let's put it in production." That's not how it works with AI. It doesn't work with AI. Actually, most often the best-performing model that I had on my test data set did not perform well in production.

[00:18:29.820] - Ketan Umare
Across many years at Lyft, at running thousands of models in production, I can tell you that, and consistent. It was not the coolest model that worked the best. It was sometimes the simplest model that worked the best. It was sometimes the most complicated model that worked the best, and it often changed. It is a new learning that all of us has to do.

[00:18:49.710] - Ketan Umare
An example for. That was also, I was talking to somebody who was like, building chatbots. I don't want to call it chatbots, but like an LLM application.

[00:18:56.650] - Ketan Umare
I'm like, "How do you deploy new versions of this to production?"

[00:18:59.860] - Ketan Umare
They're like, "Yeah, we just go ahead, test it locally and then deploy."

[00:19:03.790] - Ketan Umare
Okay, what if it regresses? I'm like, "Why would it regress? We're still using OpenAI, we're still using the same database." I'm like, "Hold on, it doesn't work like that."

[00:19:12.340] - Ketan Umare
And then I told them about the idea of A/B testing. They knew about A/B testing, but they didn't think, like, why do you need A/B testing? I'm like, "Because the behavioral pattern changes and that's a huge outcome for your customers." I think OpenAI themselves had a problem. They deployed a ChatGPT or a new GPT variant, which actually, I forget what it was doing wrong. I think it got the code wrong or the tool wrong or something. It's a massive change in behavior and it impacts everything. It actually could have a downward effect on your customer, so you have to be careful about it. And this is true with all AI products, including legacy ML products that we've been doing for 20 years now.

[00:19:47.240] - Simba Khadder
Totally. I remember actually a few times. There was like one time where it went viral on Twitter where it was almost having like, real hallucinations. Like it was just kind of going off the deep end on like, random questions, and it was quite entertaining. But again, if you had a critical application dependent on it, like, this would be a huge deal.

[00:20:04.830] - Simba Khadder
I think similarly, I mean, my background is in recommender systems, so it's so funny how much stuff that was old and obvious to me.

[00:20:12.674] - Ketan Umare
It's so similar.

[00:20:12.800] - Simba Khadder
Yeah, it's the same problem. Like, how do you evaluate a recommender system? It's the same thing. It's like people are like, "Oh, just put a thumbs up, thumbs down," but no one clicks. When's the last time you clicked on a thumbs up, thumbs down? So you use implicit information, and so it feels like we're relearning a lot of the things that we learned in recommender systems all over again.

[00:20:30.160] - Ketan Umare
Also, the signal dies. The signal becomes worse over a period, right? It's just that. That's like, even if consistently everybody puts thumbs up, thumbs down, the signal becomes diluted over a period of time. You need new signals in the app.

[00:20:41.740] - Simba Khadder
Exactly. And I mean, the other thing that also applies to most AI use cases is the idea of serendipity, which some of the times you just have a specific goal, in which case it's more obvious. But a lot of the times, there's a difference between that, "Yeah, that technically answers my question," and, "Whoa, that just changed my whole way of thinking about something," which I've had happen at ChatGPT.

[00:21:03.800] - Simba Khadder
Before. I ask a question about something, I'm trying to kind of wrap my head around. On the side, I really love physics, and sometimes I'll ask it physics questions and it will say, in a way I'm like, "I thought I understood this thing, but now that you put it this way, this answer wasn't just good, it was excellent." That might have only been to me, it might have been a really bad answer for someone else. And so learning how to do that stuff, it's just hard. And we're going to have to relearn a lot of the same things in this new environment, which is kind of...

[00:21:29.350] - Simba Khadder
I feel like an old person saying it because I remember when the people before me were like, "Deep learning. It's just another phase. Like, we're just doing all the same things over again." I'm curious. I mean, we've been talking about LMS for a little bit here. I know Flyte's been releasing some workflows around LLMs, agentic workflows. I'd love to kind of get a sense of how you're thinking about the future there with Flytes and Unions, specifically.

[00:21:51.100] - Ketan Umare
Yeah. With Flyte, we have not released any agentic work, so there is some experiments we've done. It's actually pretty interesting. Literally, when ChatGPT came out about a year ago, sent an internal message saying that we have to see where this is going, because this might be the right thing to be building for. And here's like a way, right, for example. But we were like, okay, I sometimes wonder if I should have gone and done that and created a hype, but also, I think it was probably too early. Sometimes there is an advantage of observing and moving correctly.

[00:22:23.570] - Ketan Umare
Specifically because we have a market for Flyte, I think, specifically towards single task, growing to multiple tasks, to workflows, to complexity, to reducing that complexity. What's happening with LLMs, and many people have come to us and they were like, "Hey, can you use Flyte to do these, like, chatbots and so on?"

[00:22:40.530] - Ketan Umare
I'm like, "Oh, hold on, that's not the right use case. Please don't do it."

[00:22:43.050] - Ketan Umare
Again, I tell people right now, it's not the right use case because it's not designed for it. And the reason why it's not designed for it is the performance. It's the performance implications, nothing else. But when we built Flyte, and I want to tell people who are seeing this, maybe might realize some people were deeply using it, would realize this is that Flyte is an implementation of a protocol, and the protocol is a way of connecting pieces. So that's one implementation of the protocol, is to actually create FML infrastructure, which can be slow, but is extremely cost-efficient and has this massive benefit of rapid iteration.

[00:23:20.980] - Ketan Umare
Another implementation could be something that goes really fast to some state and connect pieces, yet in a logical way, because the base pieces are still right. Like one of the things with LLMs and creating chains is that you want some semantic correctness to be associated with it. LLMs could dream up whatever change you want, like saying, "Go call Instacart, and do this and deploy or something." But how do you know it's correct? How do you know that the inputs that you're going to pass to this API or the tool is going to be accurate? Then you have to do custom coding to manage and massage. What if you solve this at the semantic layer and make it better? Because we've been doing this. Programs often do not fail because they are statically typed and checked and whatever. So one of the things with Flyte, we do get that.

[00:24:06.840] - Ketan Umare
The other thing is it's shippable rapidly, so you could move things into an hosted experience to be really, really fast. You could actually take a local idea and ship it to production rapidly. Because it is centralized, you could actually have cost advantages where you could run the cost of running, like, let's say five chatbots, which are doing ABCDE test, or multi-arm bandits or whatever you want to do, you could get cost advantages for them because that's another problem. People don't realize that AI becomes extremely expensive because of the experimentation. It's extremely expensive. This is a true problem. Even with LLMs.

[00:24:42.630] - Ketan Umare
It's going to be like, think about if ChatGPT or OpenAI has to run five models always in production, it's expensive. They would much rather be using the same infrastructure, for one. But you have to do it just to maintain correctness and backwards compatibility, for example. And so we think this is an opportunity for us to disrupt and change, and this is where we are going because it's a natural way of thinking about the problem, about how workflows, or pipelines, or graphs, or whatever you want to call them, chains are created and how they're semantically correct construction. It is tunable as well as observable and yet extremely low footprint in terms of expensive for the user.

[00:25:24.520] - Simba Khadder
One thing that I really want to get your take on it. There's also, I think there are a new set of data scientists that are really just LLM scientists, like, that's all they've really used, and a lot of the focus has gone there. There's obviously the traditional machine learning that's been around. I mean, there's statisticians too, and a lot of different waves of things. But specifically, my question is, how do you think traditional machine learning, like what we had been doing for many, many years before GPT was released, how is that going to get disrupted? Is it going to get disrupted? Do you view that these are two different things that will live together? Do you think that LLMs will become the de facto thing?

[00:26:05.500] - Ketan Umare
Disrupted is a big word. I actually want to maybe contrary opinion to people. I think the disruption that's happening is to legacy software. The disruption is for software. People are now finally realizing that AI is... We have to use it, which is fantastic.

[00:26:21.890] - Ketan Umare
Let me give you an example. I think about it as a pendulum. The pendulum was, most people thought ML is a narrow market because it's hard, it's expensive, it's hard to crunchnize, a bunch of other things, right? Which is true. So it will only be that whatever profile company in the world will probably do it, right? So the market is small, whatever. That was here. And then the remaining people are not going to touch it. They will play around, do something, but it will not be as critical. With ChatGPT, this pendulum went to the other side. It's like, "Oh, everything's going to be AI."

[00:26:55.570] - Ketan Umare
And I'm like, "Okay, doomsday," and all kinds of random prophecies. And I'm like, "Hold on." If you naturally saw the progression of AI, you would have assumed this is going to happen now. Now, I never thought that it will happen in 2022 or three or whatever. I didn't know the timeline. I didn't know if it will happen in my lifetime, but I knew it was going to happen. Another one is going to happen and eventually there will be an AGI. It's going to happen. Is it going to happen in a hundred years, 200 years, thousand years? I don't know the answer.

[00:27:20.640] - Ketan Umare
If you take that as a goal, there are two things that I think about. One is a fear that, are we going to enter another AI winter? Which is a terrible feeling to feel because, you know, whenever there is one success, a lot of innovation dollars go towards it. My feeling sometimes, are we in a local minima? Like, have we hit that local minima with LLM, with transformers? And now it's just like, "Okay, let's go transform and transform and transform and transform." And maybe it's not. There is no solution over. I don't know. I can't tell.

[00:27:48.930] - Ketan Umare
But hypothetically, from the pure point of view of what a transformer does, some people are saying we'll reduce hallucinations to almost non-existence. And I'm like, "But that is a feature. How can you reduce hallucinations for something that's so creative? We've created this idea-

[00:28:04.550] - Simba Khadder
By design.

[00:28:04.640] - Ketan Umare
Yeah, by design. How do you produce? You can augment and produce, but that's... You're basically building a new model. This entire thing on top you're building is a bunch of models.

[00:28:13.440] - Ketan Umare
We used to do ensemble ML techniques and five different models and personalization. All of those will happen. So it's natural. With one more new way of doing, like XGBoost, LightGBM, regression. Now we have a new way of doing this generative AI, and I think another one will come in, like video-based, another one will come in, and I think this is going to continue.

[00:28:33.720] - Ketan Umare
The disruption is going to happen for software, but I don't think the legacy stuff is going to go away. I've still not found a great way to solve operations problems, solve simple behavioral detection, or even vision problems. You have to detect, segment the image like you need a model to segment data. Now, that's a deep learning model. I'm not saying no, I'm just saying that it's not a generative AI model.

[00:28:57.300] - Simba Khadder
Yeah, I think that's spot on. I've always viewed it as it's a very powerful tool, but it's another tool in our tool belt. My, I guess, hot take is that I think in five years, the most common model in production will still be some form of a random forest, whether it's LightGBM introduced, it's some sort of-

[00:29:14.984] - Ketan Umare
It's actively good.

[00:29:15.020] - Simba Khadder
They work really well, really easy to understand, they're really cheap, and most applications today, a lot of the ML you see are fraud detections. It's stuff that is so transactional. As in, it's so fast and so critical to have some level of understanding that an LLM is just not the right tool for it. At least I don't think so. You can make it do it, but it's not the right tool for the job. I think we're seeing that now, and I think what I'm seeing is a lot of the big companies are finding ways to put these things together, their ML and their LLMs, even at the platform level.

[00:29:50.850] - Simba Khadder
I think that there are other approaches I'm seeing where it's just trying to build this whole new thing and use LLMs everywhere and then use LLMs to evaluate LLMs, and LLMs create features that feed into other LLMs, and you start to maybe over saturate. Just because no one's doing evaluation well, these things sound so cool and they kind of work that I think people just like want them to work. I think we're going to get to the end.

[00:30:14.530] - Simba Khadder
And then even deep learning-

[00:30:16.688] - Ketan Umare
It's a house of cards.

[00:30:17.030] - Simba Khadder
They were just stats. I mean, it's just an end. You need to understand data, see what the data is doing, find the best way to put that together, and this is another tool that takes data in and generates data in a way that's unbelievably accurate compared to what we've seen before.

[00:30:30.620] - Simba Khadder
This has been awesome. I feel like there's so much stuff I would want to keep talking to you about, so maybe I'll have to pull you back on one of these days, but I really appreciate you taking the time and answering all my questions and sharing your views and learnings with our viewers.

[00:30:43.270] - Ketan Umare
No, no, absolutely. This was fun. We went on the right tangent track, I think. Again, probably we look back and say like we were all stupid, that maybe LLM did everything.

[00:30:52.000] - Simba Khadder
That's always a risk.

[00:30:53.180] - Ketan Umare
Yeah. But I do think like, you know, XGBoost was a change in the way we solve the problem that that happened and now LLMs are doing it. And hopefully there is one more. Like, I think the vision models, some new generative AI vision models are pretty amazing and the video models are pretty amazing. They're going to do amazing things. We just are increasing the dam, in my opinion, just increasing the market here. Let's do it. I think it's better to embrace it and let's go with it.

[00:31:16.720] - Simba Khadder
I agree fully. Thanks again.

‍

Building the Future of ML Platforms with Ketan Umare

MLOps Weekly Podcast

Description

Transcript

Related Listening

MLOps and Feature Stores in 2025 with Ben Epstein

Bridging Software Engineering and MLOps with Paul lusztin of Decoding ML

From Recession to Al Boom: Venture Capital Perspectives with Gautam Krishnamurthi

Featureform's CEO Breaks Down "Real-Time" Machine Learning

Ready to get started?

PRODUCT

RESOURCES

COMPANY

PRICING

DOCS