Hey, everyone. Simba Khadder here of the MLOps weeklypodcast, and today I'm speaking with Piero Molino. Piero is the CEO andco-founder of Predibase, a company that's redefining machine learning toolingwith a declarative approach. He previously worked as a research scientist,exploring ML and NLP at Yahoo, IBM Watson, Geometric Intelligence, Uber, wherehe was actually a founding member of the Uber AI organization, and Stanford.He's the author of Ludwig, a Linux Foundation-backed open-source declarativedeep learning framework with more than 8,500 stars. Piero, great to have youhere today.
Thank you very much for having me, Simba. I reallyappreciate the time and the opportunity.
I'd love to just jump in. I give a quick intro on you, butI'd love to hear in your own words. Tell me a bit about your journey to get toPredibase and building with Wix.
I will try to keep it short because it's actually prettylong, but I don't want to bore people with my own story. But I started byworking on open domain question-answering. That was my research when I wasdoing the PhD. Then I worked at a bunch of companies, large ones like Yahoo andIBM Watson, where I was actually doing... Exactly what I was doing my researchon was also the same thing that I was doing at IBM Watson.
But then I felt the urge to work at a smaller company and towork shoulder-to-shoulder with people where my decisions were really impactfulfor the company. I joined a startup that was called Geometric Intelligence,founded by a bunch of really nice people like Gary Marcus, Zoubin Gharamani,Ken Stanley, Jeff Clune and Noah Goodman. Because they're really well known inthe machine learning space and they have a lot of experience, I wanted to workwith them to learn a lot. That was my main intent when I started working in thecompany.
The company was acquired by Uber and so that's where Iactually started working on Ludwig, which is the open-source foundation behindPredibase. The reason why I was working on that is that when I was at Uber, Iwas doing both research and application, so many different domains and manydifferent actual machine learning tasks and products that Uber was addingmachine learning capabilities to.
One of it was dialogue system for Uber drivers. Another onewas a customer support model called Kota. Another one was the recommendersystem of Uber Eats, where I added a few additional capabilities. Another onewas a fraud prediction model.
By working on all these different projects, I saw that therewere a lot of things in common among these projects. I could build somethinglike a tool for myself for making it much easier to work on the next projectwithout reinventing the wheel from scratch.
That was the motivation for building Ludwig, which basicallyis a tool that creates an abstraction over the building actually of the machinelearning pipeline by just requiring configuration file from the user, similarto what Terraform does for infrastructure. It's a declarative configurationwhere you can say, "These are my inputs, these are my outputs, and theseare the models that I want to use in the training parameters," forinstance, and it builds the model for you.
This abstraction is at the core of what I'm building now atPredibase because this declarative abstraction is what we use for making itpossible to more users actually to build machine learning models. We'rebuilding a bunch of capabilities around it, including capabilities ofconnecting with data, capabilities of managing the iteration of our models,capabilities of deployment, and capabilities of running these things at a largescale without having to care about the infrastructure to make this technologymore accessible to organizations and more people within organizations, bothengineers and scientists.
Also, I love that you mentioned Terraform. Actually, ourname Featureforms comes from Terraform. I definitely think that the declarativeapproach to have a model is the right one. A lot of people, I imagine,associate with AutoML. It's because I think there was maybe in a previousiteration, I think it had more to do with it. It's obviously evolved andimproved over time. I'd love to, if you could touch on that, is Ludwig anAutoML product? Is it not? When did that change happen? Did that change happen?I'd love to learn more.
Yeah. Actually, it's very interesting that at the beginning,I was not the one who used the term AutoML to describe it, but actually, whenpeople picked on it, there were some videos on YouTube and some articleswritten by other people who defined it as an AutoML tool, but I was not the onestarting it, I guess. The reason is that I think there's a substantial differencein the very basic approach behind it, although we added AutoML capabilities toLudwig.
The main difference is that Ludwig starts as a mechanism fordefining a declarative configuration for describing your own models. At thevery beginning, there was no automation of defining what the model is. You hadto define it in a simpler way for the configuration, but it was still somethingthat the user had to do. The fact that there are still a lot of defaults makeit feel like it's an AutoML tool, but there was no intelligence in definingthose defaults beyond me picking values that I believe are working from papers,for instance.
To give a concrete example of this, if you specify that youhave a text classification task, and you specify when input is text and whenoutput is category, that's all you need to specify in Ludwig to make it work.What happens is it chooses the default model, which is like a CNN for text,because it's a really lightweight one compared to RNN transformers, and usesdefault cross entropy as the loss.
In the end, it trains like a really competent model to beginwith, but there's no anything smart about it. It's just that's the default. Theuser can go there and says, "Well, I want to use an RNN or a transformerinstead of the CNN as the encoder for the text."
The AutoML tool, what it actually does is it tries a bunchof different things for you, encoding some intelligence into this choice ofwhat models to try, and then picks the best one.
Later on in Ludwig, we added... So this was at the beginningof Ludwig. Later on, we added additional capabilities, one around upperparameter optimization. You could say, for instance, "I want to..."Still declaratively, which means you can say, "I want to try a CNN and anLSTM transformer for this task, and I want to try this learning rates withinthe range of 0.001 to 0.0001." The process looks like an AutoML processbecause you try a bunch of different things and then they are stack rankedaccording to the performance. But still, the choice is the users when they'redefining the ranges of parameters that they want to choose.
Then finally, more recently, I think it was already ayear-and-a-half ago, something around that, we added some AutoML capabilitieswhere you can go in Ludwig and you have an AutoML sub package, submodel,really, and you can say, "Given this data set, suggest me aconfiguration." But there is some smartness there because we tried manydifferent configurations and we identified a bunch of configurations that workin many different scenarios for some specific tasks. Now we also have thesecapabilities.
But the core of it is that you have a configuration and youcan change it and modify it the way you want and you can iterate over it, whileAutoML is still a one-shot process where you have a data set and you get themodel out and you have no levers, really. Nothing that you can do in theprocess once you get the model out. That's a fundamental difference in thespirit of it, really.
So Ludwig does a lot. It sounds like it's almost in someways competitive to PyTorch, but also being competitive to some of the AutoMLframeworks. What would you say... I guess two questions. One is, is your usertypically any data scientist doing machine learning? Is there a specificsubcategory of data scientists who use you more? And I guess the follow onwould be, why [inaudible 00:08:45]? Why would they choose to use Ludwig overusing PyTorch directly or just throwing it into DataRobot or [inaudible00:08:53]?
Right. I think there's two souls, if you want, of the toolthat also reflected by their users, the two kinds of main users if you want.And I would say, by the way, Ludwig is built on top of PyTorch, so basicallyevery single Ludwig model is a PyTorch model. And so it's not really areplacement for PyTorch. It's like a higher-level abstraction that makes iteasy to do PyTorch.
If you want, on one hand you have, again, the more... Let'stake it this way. On one hand, you have the more detailed tools, low leveltools that machine learning engineers use, like PyTorch, TensorFlow, [inaudible00:09:35], and on the other hand, you have the AutoML tools. We believe thatthe declarative obstruction and Ludwig as an example of that declarativeobstruction is a happy middle where you have the degree of control or close tothe degree of control of the low level machine learning framework and thesimplicity of use of an AutoML tool without sacrificing the actionability andthe fact that you can change any single parameter.
As a consequence, users that are using Ludwig right now andare targets for Ludwig really are both more experienced users that could buildthese models themselves, but it will require them time, and so by using Ludwig,they are saving a lot of time, or users that maybe are the first categories,like data scientists and experienced machine learning engineers.
The second category is people that may be may not know howto build, for instance, deep learning model with PyTorch for a specific task,but the configuration system makes it very easy for them to get a competentmodel out of the box, and so those people are the people that would be moredrawn to an AutoML tool, but at the same time, don't adopt them because theycreate an artificial ceiling. They want to grow with the tool and want to havethe possibility to change the parameters and basically iterate over the modelsand improve them.
These people are more engineers. They want to get somethinginto their application, for instance, in the building, and they want to add themachine learning capability. But at the same time, they don't want to be lockedin into an AutoML solution, or they don't want to just cross their fingers andsay, "Well, if I'm going to get a good model out of the box, great.Otherwise, I don't know what to do." They want to feel like they caninfluence the process of how they get the model themselves.
I like the comparison of the happy middle. How would youdefine the category? Is there a category that Ludwig fits in or is it its ownthing?
We are describing it as a declarative machine learning toolbecause of the configuration-based approach. I think it's again, it's slightlydifferent from both the low-level machine learning frameworks and the AutoMLtools. But I think all of these things live in the same space to a certainextent, which is machine learning tooling, really, machine learning platformsor machine learning tooling.
Where do you think it fits into maybe the broader MLOpsecosystem when you think of things like [inaudible 00:12:06], Comet, which Iassume there's probably more tie-in all the way from feature stores,observability platforms? How does it all fit together in your head?
This is maybe slightly different from Ludwig and Predibase,if you want. For Ludwig, for instance, we have plug-ins for Comet with somebiases, MLflow for tracking experiments when users run them through Ludwigautomatically. So just add dash-dash Comet, for instance, and the experimentthat you're running in the training or the prediction that you're running istracked on the specific tool that you specified.
With respect to object stores, there's nothing explicit inLudwig, but I think there's a very, if you want, clean interface between objectstore and Ludwig, meaning that literally the output of the object store can bethe input to the training of Ludwig, and then the output of Ludwig can bewritten back into a data source that then the object store can read from.
I would say the only caveat there is that Ludwig also doessome data preprocessing. The way we define it really is anything that is commonamong multiple machine learning tasks that is specific to a data type, we tryto incorporate it. For instance, for text, tokenization, shortening of the textup to a certain specific length, cleaning of text by, for instance, lowercasing or not. These capabilities are there in Ludwig. The same is true forimages and other data types like normalisation for numerical values and thingslike that.
What is not there in Ludwig is something that is bespoke forthe data set, which could be, for instance, having a notion of rolling up atable for deriving features or for aggregating them or things like that, that isnot in the domain of Ludwig. Feature stores are the best solution that we knowright now for doing those things. Ludwig can take the output of that and trainthem also.
For Predibase, I would say we are trying to make the experienceof the users really cohesive, if you want. We still don't focus on the featurestore part, but all the other aspects of model management, model deployment,and infrastructure, we take care of all of them. The reason is that we believethat through that integration, we can provide a much better experience. Manyorganizations, what they do, they take different tools, maybe best-in-classtools for all of these things, and they put them together in a way that maybeif you want... Let me rephrase this thing.
I would say many organization pick best-in-class tools andtries to put them together into a way that is cohesive and there's merit tothis approach. But what we're trying to do is we're trying to make it so thatthey don't even have to think about putting these tools together. It's a higherlevel of obstruction that we are trying to provide to users. The reason is ifyou have a system that knows exactly, like a deployment system that knowsexactly what is the specific models that are going to be deployed, it can bemuch simpler than a tool that needs to support every single model format.
Same is true for experiment tracking. If you know exactlywhat is the format of the output of the training process, you don't need tosupport TensorFlow, PyTorch, or any other mechanism for training models thatcan write metrics in a super generic way that is maybe not supported already.Because of that, we can make decisions that make each single component that weare building substantially simpler than what best-in-class solution thatsupports everything is, but at the same time delivers the same amount of valuefor the customer since they're adopting the platform.
Got it. So today, it seems like there is simultaneously alot of, let's call it MLOps platforms, Predibase and that kind of... It's inthat realm where it's going across a few of the maybe proto-categories ofMLOps. Then there's obviously the, let's call them category players, likeobservability companies, there's serving companies, et cetera. What do youthink the future is there? Do you think it mostly will continue to split up andbe like many different categories that people will meet together? Do you thinkthey'll mostly be platforms? I assume it'll likely be a mix, but I'm wondering,do you think it's going to lean heavily towards platform-based or heavilytowards stitched together best-in-class vendor-based?
Honestly, I think that machine learning, we're alreadyseeing it, honestly, is growing so much as a field and so much as an industrycategory, if you want, that I believe there will be space for all of thesesolutions because they target different customers, really, and what they arecapable of building in-house, what they are capable of buying out, and whatthey find the highest value building, assembling, or using.
I imagine a world where customers that are not techcompanies may not want to build anything in-house and not even stitching thingsin-house. Customers attracted to companies that need to have a deeper degree ofcustomization and control over what they're building. They may be buildingsomething in-house and stitching something together. And as us scalers, we'llbe building everything in-house because even like a 0.0001% improvement inefficiency, either accuracy or speed or performance or anything, means millionsand millions of dollars for them, so they have a reason for doing that.
I think you're going to see this full spectrum. I thinkwe're going to see different classes of companies adopting different solutions.I don't think it will be one that will overcome all the others. That's the wayI feel about it.
Obviously, one of the hot topics today is, let's call it,LMs and foundational models.
Where do you think the world is going? Is your ML dead? Isit over? Is everything going to be a founda... Have we figured it out? It's allfoundational models? Is it going to be a mix? What's your sense of how... Andalso, I guess another part I'd love to have you expand on is, are these twoseparate paradigms? There will be traditional ML workflows and there will be,let's call it, foundational ML workflows, or do you think there's going to besome mixing?
That's a very interesting question, in particular, thesecond one. Let me start from that one, I would say, and then try to go back tothe first one. On the second one, I think that there is mixing. Also, we arethinking about it at Predibase, and we're starting to put out some materialabout it, some webinars and some documentation about the way we're thinkingabout it, is that now we have function that in many cases is capable ofproducing the outputs that a machine learning model was required to produce upuntil recently. That is great because it means that the barrier of entry issubstantially lower.
At the same time, that function may not be the best one frommany different perspectives for solving the task. It's a function that can domany different things. Maybe that specific thing that you wanted to do may notbe the best one at doing that specific thing, may not be the fastest one atdoing that specific thing, may not be the most cost-efficient one at doing thatspecific thing.
In my mind, there will be a coexistence of large languagemodels, foundation models in general, and more, if you want, traditionalmachine learning models because of the fact that I can imagine that users willapproach solving their problems using an LLM, see that it is feasible and there'svalue in doing it, and then finding ways to make it cheaper, faster, andcost-effective, really, and that will be probably building a bespoke model forthe specific tasks that they care about.
And in that same light, is the same thing true of AutoML?You would maybe use an LLM as a generic, almost perfect AutoML-type thing, butthen you might use an AutoML solution or tool to try to achieve similarperformance characteristics for a much lower price, maybe faster. Is that theright way to think about it?
Yeah, I think that makes sense. It's a matter of what is,from a performance perspective, good enough and what is good enough from,again, all the other considerations like in cost, speed, and all of that.
There will be some cases where the LLM may be good enoughfrom all these points of view. And there will be cases where it will be not.AutoML has the same promise, if you want, that I give you some data and there'sa model coming back from it, and that the model is good enough for your task.
The problem is when you get out of the happy path when it isnot good enough. The same thing is true for data learning in my mind. When itis not good enough, what are you going to do? Then all the other things thatwe've been working on for a while will keep on being relevant in all thesecases.
I'm curious, just because I know you have some researchbackground in question-answering and other stuff in that realm. You're probablyvery familiar with transformers, embeddings, that space in that problem space.We use to my last company, the recommendation system, we used to... We havelots of multi-model stuff. We'd create embeddings on images, on users, onpretty much anything. We'd feed them into further models that have veryspecific tasks, whether it be ranking, whatever it be. It's a lot of predictingsomething to subscribe, et cetera.
It's interesting to see vector databases finding this newhome and this new LLM landscape because it's almost like treating LLMs as thissuper transformer. I guess that makes me wonder if we're going to start usingLLMs as same way we've used transformers historically, where it's like,"Hey, we're not using Bard, we're using GPT-7 now because it's 100 timesbetter." Do you think that makes sense? Do you think you feel like theworld is moving that direction, or do you think that's not what the futurelooks like?
Yeah. So I think in particular, from a question-answeringperspective, and again, recommender systems in this sense are relativelysimilar to that in my mind. You can use an NLM for embedding stuff. I think thevalue in it is slightly different than what it was before because before youwould embed something and then retrieve it and the retrieval was the task inand itself.
I think now you can do more interesting things than that. Anexample is you can index something in a vector store and then when youretrieve, what you retrieve is not the output, what you retrieve is what goesinto an input to a further step in the processing of that information throughthe LLM. You may want to summarize it, you may want to add references, you maywant to use the supporting evidence for the answers that you are giving. Youmay use it as samples for doing few short learning, really, and then actuallyit's just examples for different tasks.
There's much more that you can do in the paradigm of,"I'm using LLM for being the controller of the process that I'mdoing." There are some examples like that. The things that people arebuilding with Long Chain are super cool. There's this startup called Fixy thatis using LLM as a dispatcher really towards other functions, which could beLLMs themselves, could be something else, and integrate the capabilities orthose other models into the interaction with the user. It opens up morepossibilities than just when they were before.
Totally. Yeah, it makes sense. Fixy is actually... We sharea lead investor of them inside us.
It's been cool to see them be so successful. I think you'retotally right. I think embedding... We've already been seeing them being usedessentially as features. It's almost like an interesting point because it'slike... What I would call maybe a traditional feature would be, let's Z scorethis, or let's do this aggregation. It's a SQL query, essentially, but it'swhat it looks like.
But nowadays, those, let's call them transformation steps,are actually transformers or even like LLMs. Those generate outputs and thoseoutputs are features. In this case, the feature happens to be a vector. I thinkthat that's going to become really interesting.
I think it's a bit untapped still because I think it's whatgoes from taking, I guess, what people call a fat client, where it's like, Ithink a lot of the LLM or companies I see in the application space there, it'slike what's the look very similar to me. It makes sense because they're allusing the same API and the API is text or prompts, I guess.
The problem with prompts is... The great thing about promptsis my grandma can use them. The hard thing about prompts is that that's aboutas far as you can get. You just have to start trying to come up with crazyhacks to try to make these things work better. Anyway, I think that's wherethings get interesting to me, is seeing embedding and these intermediaries andstarting to build logic on top of those to build more interesting application.
Yeah. Again, I agree with that. And also there is thisinteresting take that I've seen some people agreeing with regarding the factthat it is true, anybody can now use this interface because prompting issomething that looks like language and can be natural for people to do that.
But at the same time, if you look at it from the point ofview of a developer, it's a little bit like waving a magic wand in the air and hopingthat something comes out. We have some affordances through language, but wedon't really know the space of what is possible and we don't really know whyand if some of the changes that we make to a prompt should or should not work.
We may have some intuition, some human language intuition,but they may not be actually true. Maybe a prompt that is slightly lessgrammatical may be a better prompt for achieving a specific goal. We actuallydidn't know that. This trial and error is really a little bit of a black hat.
If you think from the perspective of someone who has beendeveloping machine learning systems through programming languages, where youhave a mental model of what the compiler or the interpreter will do and youknow exactly what you need to change to make that happen, it's a little bitdisconcerning for people who have been building things that way.
It reminds me. There's this fact I learned, which is, crabs,the animal, have been evolutionarily created many different times from manydifferent places. There's the joke of it's the most effective or efficient.It's like the perfect... Evolution has decided that this thing is a globalminima. We keep ending up here. I have the same joke about SQL, where it's likewe keep moving away from it, and in the end, we always seem to come back toSQL. I joke about maybe one day we'll be prompting our LLMs in SQL again. We'llhave a SQL dialect for LLMs.
There's already one. It's called like... We are doingsomething like that ourselves because we have this people programmingpredictive language in Predibase, and there's this... Let me search it upbecause I want to give you a good answer about it. I think it's called LLMQL,if I am correct, or LMQL. I want to be precise about it. Yeah, so LMQL. Thereis this other research team actually that is building this thing called LMQL,and there's a paper about it. We are doing this covering SQL yet once more forquitting large language models. It's interesting.
I also strongly believe that SQL declarative interfaces likethat are a global medium, at the very least a local one, that we end up theremore and more time. I remember when I was at Yahoo, for instance, it's a littlebit of a side, but Yahoo was the company open-sourcing Hadoop at the beginning,and so it had a lot of legacy. When I interned there, I was writing Java codefor running Hadoop jobs on Hadoop 0 dot-something when the whole industry wasat Hadoop 2 dot-something just because Yahoo had a lot of legacy Hadoop stuff.
I was writing basically select and where clauses but with avery verbose way, really the heart of the bug and all of that. Then there wereadded SQL parses into [inaudible 00:29:45] jobs that made it substantiallyeasier. We really discover the basic things sometimes over and over again. Iagree with you.
Exactly. Yeah. Then when Kiv comes out and it's like,"Yeah, maybe one day after the AI apocalypse, there will be just crabs andSQL. There will be nothing else. That's it. That's the best we could come upwith." Piero, it's been great to have you on. I really appreciate yourtime and your takes and your insights. I hope to maybe have you on again.
Yeah, looking forward to it. I had a good time. It was funtracking. Thank you for spending time with me.
From overviews to niche applications and everything in between, explore current discussion and commentary on feature management.