Uber's Feature Store and Data Quality with Atindriyo Sanyal

June 14, 2022

Episode

MLOps Weekly Podcast

Co-founder of Galileo

This week, we sit down with Atindriyo Sanyal to discuss feature stores at Uber and the importance of data quality.

‍

Listen of Spotify

‍

TRANSCRIPT:

‍

Simba: Hi, I'm Simba Khadder, and you're listening to the MLOps weekly podcast. Today, I'm chatting with Atinidriyo Sanyal one of the founders of Galileo, a data quality platform. Prior to Galileo, he co-architected the MichaelAngelo feature store Uber, and worked on the ML platform powering Siri. And, it's great to have you on the show today.

Atinidriyo: Oh, thanks for having me. It's great to be here, Simba.

Simba: So, I like to start with the question of what does MLOps mean to you?

Atinidriyo: That's a good question, ML Ops, I think to me, it's really about bringing the discipline of DevOps in application development, which includes software design, unit integration testing, the whole gamut of CICD, including monitoring and observability, but bringing that to machine learning models. Because models are simply software artifacts, just like libraries and APIs and services, the same principles sort of apply to automating the lifecycle of ML models, right from the pre-training phase, including feature engineering, all the way to deployment and monitoring. So in a nutshell, it's really about applying software engineering principles, but for models.

Simba: So, I loved DevOps, MLOps analogy. It's a commonly used one, and I think I'm still kind of feel like everyone's still fleshing out what it really means, like what should be the same and what should be different. What do you— so you talked a lot about the parallels, like what should be the same, but what is the difference, almost like why does MLOps have to exist? Can we just do everything in MLOps with DevOps tools or what does MLOps have that's unique?

Atinidriyo: Yeah, that's a good question, so the way I see it, there's some nuances in machine learning, which makes MLOps a little bit different from traditional DevOps, and that primarily has to do with the APIs and the libraries that you sort of expose as part of your MLOps platform to be able to build models. Overall, it is similar in that even model code is just like application code, except that models are architected in a slightly different way than traditional software applications. So, the differences are mostly in the APIs that an MLOps platform would expose to a data scientist and a good MLOps platform to that extent would essentially include the right abstractions where you abstract away the complex feature engineering, for example, or deploying the models at scale While you surface the right endpoints and the APIs to data scientists, to be able to build custom ML pipelines that can sort of cater to their ML needs. And that, I think in a nutshell is sort of the difference between MLOps and DevOps.

Simba: Got it. And on the similar thread, you've seen things from apple with Siri, and you've seen Uber and now you're kind of going on and working on Galileo. And I guess what's the north star for MLOps in your point of view, as in what does the perfect MLOps workflow look like, especially from a data scientist's point of view?

Atinidriyo: That's a good question. I think perfect MLOps workflow is really about, as I said, automating and abstracting away the right parts of the machine learning workflow, while giving the flexibility and customizability to certain other parts. For example, a good MLOps workflow would automate the management of features, but give you the right APIs to design custom sort of transformation pipelines for your model. A good MLOps workflow would automate the deployment of models at scale and provide centralized monitoring and observability of all kinds of metrics. There's system level metrics, memory, CPU, there's model level metrics, like feature drift, and prediction drift. So a good MLOps system, it’s really not about click button ML, but rather, you want to give modular APIs to data scientists, so they can create ML pipelines, which are custom and powerful, and cater to their needs, while giving the data and the features at their fingertips without them having to worry about how the feature values are kept up to date, how they're materialized, and once the models are in production, you really want to give them tight SLAs and a robust alerting system so that they know when the models have gone wrong in their predictions, and you really want to tighten the loop on model down times, and the way you can sort of retrain your model quickly and push it back into production.

Simba: Where does experimentation fit into all this? Like how does—because a lot of what you've talked about is very oriented towards getting things into production, and I guess experimentation is obviously a really big part of machine learning, that's I think it’s unique to it compared to other disciplines of engineering, of software engineering in particular, where does experimentation fit into MLOps?

Atinidriyo: Yeah, that's a good question. Experimentation sort of falls, so if you were to draw the ML life cycle on the left hand side, you have the pre-training phase, where you're choosing the right features for your models. Then you have training and validation and evaluation, and finally you do testing and you deploy your model. So, experimentation is kind of sprinkled across the workflow, at each point it's almost like a sort of tech check to make sure that the model's performing per expectations, at each point of the workflow, but it sort of—it focuses a little more towards the left, where you're essentially choosing the data that you're fitting onto your model, you're choosing the right evaluation and test sets to test your model against, and then there's also, of course, hyper parameters where you want to choose the right hyper parameters to get the best model for your data. So, a lot of that has to do with experimentation, and today in a lot of the auto ML platforms, experimentation is typically done by spanning out different training jobs and really just choosing the one which gave the best result on a held out test set, which is sort of this SAC San data set. So experimentation is super critical, to churning out a really good model, and it is sort of sprinkled across the different parts of the workflow, but it's primarily sort of left leaning.

Simba: Got it. Yeah, It's super interesting how everyone's mental map of the workflows is very different, because I like the idea of this kind of this flow from almost like data to production and monitoring, and experimentation's almost like a layer that sits above all of it. And you've seen ML infrastructure, both for apple and in Uber, and, in what I've seen, it seems like the big companies actually have a much harder time building ML platforms just because they're so much bigger, we're spanning so many different use cases where Uber maybe was able to be more opinionated, and it seems like they were able to, but I'm curious, you've seen it on the inside at both places, what would you say the big differences are between how apple and Uber built their machine learning infrastructure?

Atinidriyo: Yeah, both companies have published very interesting and innovative research in the machine learning space, in the last two to three years. Although at apple, the ML infrastructure is very decentralized. For example, the maps team has their own internal ML stack, and so does the Siri team, and there's very little sharing, and that's just because of the way apple, at least has grown in its software services field in the last about 10 odd years. Even the research teams over there are pretty siloed, but Apple's now slowly trying to bring all the ML services together under a single umbrella. But given how big the company is, it's a very long process. Uber on the other hand is very interesting because, they started with a centralized model of right from day zero. And I think that kind of worked in their favour, because technologies like the concept of feature stores was evangelized at the company due to a company wide need for organizing and centralizing data for all their machine learning models. But that said, I think the scale challenges that both companies are very similar, hence the entire gamut of technologies that they use are fairly similar, there's a good amount of overlap. It’s also very good to see apple contributing a lot to open source more recently in the past, systems like spark, over the last two, three years Apple's made significant contributions. Uber on the other hand, has always been very actively involved in the open source community. A lot of key open source technologies came out of the Michelangelo team at Uber, for example, Hello world and Ludwig, and these technologies have been adopted right across the industry. So I would say those are the key differences between Apple and Uber.

Simba: Yeah. It makes a lot of sense. It's actually a very succinct way to put it. When we go and talk to large banks, for example, we find that a lot of times they have very decentralized ML infrastructure, and it's almost because when they started to really invest in ML infrastructure, it was kind very had many different teams, so they kind of had to be decentralized. Where I feel like Uber and some of these other companies were in the sweet spot of size where they could invest in ML infrastructure, but they could also just do one centralized approach. With the decentralized ones, like we've talked about Apple and Siri, which is voice and is a whole different kind of—it's not tabular data. Did the platform look similar? Like was there kind of the same idea of a feature store of monitoring, did everything kind of look similar, to how it would look at Uber? Or was it a different architecture completely?

Atinidriyo: Yeah, no, I spent about half a decade there, so I kind of saw the evolution of these NLP systems going from these rules based engines to starting to use basic models like patient classifiers and decision trees all the way to now the more powerful transformer models that they use today, but the architecture was similar to the way you would design any end to end software stack, Siri for example, there was a speech team, of course, which dealt with the initial sort of stream of speech data converted that to text, which went to a separate service which was the NLP service, which initially was a set of rules, which punched out and intent at the end of it, and slowly those rules were sort of replaced with a bunch of models and that's how machine learning kind of came into the core part of serious natural language understanding. Speech side of machine learning evolved separately, so there were always these two separate stacks of ML, one for speech and one for a natural language, NLP classification, which evolved separately, but they're sort of, again trying to find commonalities between the two stacks and trying to centralize them under one umbrella. But overall I think the system was initially, and in the initial years it was designed for scale and a lot of focus was on sort of making Siri faster, and it's towards the end of 2015, 2016, when they really started to make Siri less rules based and more machine learning oriented, the NL system hosted a whole bunch of models which were retrained almost daily on new data, and the stack for retraining the automation was similar to what you would see any other company, and as in more recent times though, I think with the evolution of transformers and other kinds of models, more powerful NLP models, there has been a focus at apple, from, this is of course hearsay, and I hear from my old colleagues who work there still that they're building custom transformer models and they're custom embeddings for Siri.

Simba: It's super interesting. Yeah, we know obviously embedding hubs. So we've seen the same sort of thing of embeddings, kind of finding their way as a first class entity of machine learning, kind of like what you talked about before of the artifacts embeddings, kind of become their own sort of artifact. So, I guess you would say probably that across NLP, computer vision, kind of traditional tabular machine learning, the platforms look pretty similar, it’s almost like you maybe have one or two extra add-ons, is that fair?

Atinidriyo: I think the challenges for if it's unstructured versus structured data, I think the challenges remain the same, the main sort of software engineering challenge around both data modalities is really around data management at scale and ensuring that models perform at low latency at high throughput. So, those challenges remain the same, the techniques though, is where the different sort of comes. For example, if I talk about data quality in particular, there's data quality issues that you would look for when it comes to structured data models versus in unstructured data models, and that sort of really comes down to math and statistics because a lot of the unstructured data is essentially just vectors, including embeddings. So, there's a lot of like powerful things you can do around embeddings using spatial geometry, even basic things like, cosine distance calculations and optimizing those at scale. There are problems which are very particular to unstructured data, on the other hand in structured data, a problem which I saw a lot at Michelangelo was data scientists, essentially throwing the kitchen sink of features at a model over, 200, 300 features would be thrown at a model and a feature store like system makes it easier to fetch features, so it kind of makes the whole process even more indisciplined in a way. So there, you can have data quality tooling that can sort of measure things like, feature redundancy and the relevance of features to labels, and even that has—there's some element of statistics to it where you measure the distribution of the values and sort of correlate it to the labels. And that way you can eliminate, in some cases we saw over 80% of the features were redundant, so you could literally eliminate it from your feature set without having any impact on the model. So, there's some differences in the way you would measure certain data quality metrics for unstructured and structured data, but in principle, I think the software engineering challenges sort of remain the same.

Simba: Got it. Yeah, that's a great way to put it, I think that very much captures most of the nuance that comes into both things, I think we've done a few of these and some things that have also come up, it's funny because where different people live in the stack, they'll almost have different answers to these things, so the perspective, so like when you talk to someone who's focused on labelling, they'll say like, oh, it's completely different and it makes sense because for labelling, it is a very different problem, but for kind of feature engineering and building models, like it kind of does, it looks similar if you zoom out a little bit, it looks similar, like you said, there are specialized techniques, but yeah, it's fascinating. I mean, I think as this space evolves, we'll start to really see how the market really decides they want these MLOps platforms to work, will there be one generic, ones that has like different kind of almost like specialized deployments or does end up of like your vision platform, versus another platform, and I think it'll be very interesting to see as the space plays out.

Atinidriyo: For sure. I think, yeah, you mentioned an interesting point about labelling, which I forgot to touch upon, and I think that's one of the key differences in some of the—I think labelling is like a huge, huge cost overhead, especially in the computer vision unstructured data machine learning side, most of the models, at least at Uber we saw with structured data, the labels are kind of auto generated, so the ground truth kind of comes from the events which transpire, on the system, for example, if the ride was taken or not, or if it was successful or not. So, labelling is not a big issue there, but in computer vision, labelling is a very, very manual human error prone process, and it's literally a multi billion dollar industry of its own.

Simba: Yeah. I could talk about this for a while, There's so—especially like what you talked about with the behavioural data, behavioural labels comes with its own kind of interesting gamut of problems. Cause like finding the ground truth—like I used work on recommended systems, so anyway, I'll leave it in front every time, there’s a lot there, yeah, So I’ll pick it at some point. Yeah. And so, we talked quite a bit about feature stores and data quality, I guess first I'd love to get like your definition of a feature store, like what does a feature store do?

Atinidriyo: Yeah, no, that's a great question. I think feature stores are a very, very critical part of the ML workflow, and it kind of lays the foundation for any organization, which sees a growing ML footprint in their company. Essentially they’re, of course, the definition of it, it's a store of ML ready data, which is ready to be consumed for any model; essentially, data at your fingertips for your data scientists, but I think prior to feature stores, every team would kind of have their own way of creating and managing features, many wouldn't even have a way to manage them, would be extremely ad hoc, and would lead to a lot of duplicated work. Typically Michelangelo itself, which is Uber's ML platform, it manages over 10,000 models, which are serving Uber's production traffic, and we see there's so much overlap in the set of features, which thousands of models use, so the feature store sort of de-duplicates the management and creation and maintenance of these features, which would in a non-feature store world, there would be so much duplicated effort and engineering work, which would go into creating these features. So I think with feature stores, as a data scientist, you get ML ready data at your fingertips without worrying about the complexities of how these feature values are kept up to date, and along with that, I think a critical part of the feature store is also giving the right APIs to be able to do these custom transformations and aggregations on the feature values. A lot of feature values are essentially aggregates, so an API which provides these aggregations out of the box is a good sort of feature store API. So, I think that's what a feature store is in a nutshell.

Simba: Yeah. And one thing I want to maybe zoom in on, because obviously, I talk to a lot of people about feature stores and one piece of confusion, I see a lot I'm curious for your take is, some people will define a feature store very literally, which is where features are stored, where others would define it as it's where features are kind of created. It's almost like defining a feature as a raw, the row of data and the database or the feature as the logical transformation. How do you think about that part of feature stores?

Atinidriyo: Yeah, no, totally. I think that's, in my opinion, I think the real sort of definition of a feature store is a combination of the two because in the end, the actual value of the feature, of course, it's really any data which can be stored in any data store, but yes, there is a logical sort of component to it, which really gives the definition or separates a feature from any data point which is stored in a database. I sort of call it the metadata layer of feature stores, which really stores the actual definition of the features. As I said before, features can be all different types that can be numeric, they can be categorical, they can be aggregates, they can come from different sources, they can be real time near real time, they can be historic, they can be embeddings. There's a lot of unstructured features as well, which go into models.

So, all this definition is sort of captured in the logical layer and that layer itself is a gold mine of metadata and information, which you can leverage, and provide a lot of powerful search capabilities, recommendation capabilities. It's literally your model universe. This metadata, there were, some efforts at Michelangelo, which I was leading around feature search and discovery where you could see which features are used and which models. And you can literally do machine learning on top of that to recommend the right features, given your models and your use case. So a feature store encapsulates both this logical layer, which to me is sort of the application layer of the feature store. And it can give you a lot of very powerful capabilities, but then there's the storage, which is also equally critical because features can scale out of proportion. It really depends on the scale of your organization, but there's a lot of infrastructure optimizations you need to do to be able to manage features on a day to day basis, ensuring there's you don't blow out storage and your compute is not going out of. So the real definition feature is a combination of

Simba: Now that the, I guess now that the feature store, I remember when we built our first feature store in my last company, like we didn't call the feature store, no one, everyone has a feature store. Like if you have features like they're somewhere, somewhere, some sort of feature store, but we didn't call it that. And we didn't really make it something that we focused on at first, even though the, the problem space has existed for a while has been solved for a while. It's new that people are, this whole space exists and there's actual teams built around this type of new infrastructure called the feature store. And now there's obviously a lot more examples in the past. if there wasn't really many examples for us to look at, we were kind of making it up as we went, it really focused on like more streaming problems, less than feature problems.

Atinidriyo: Right?

Simba: Well, of all the contexts you have now, what do you think that Michael Angel's feature store did well? Like what do you think it's something that even, with all the new stuff that's come out, you think that Michel was uniquely good at and how it was designed?

Atinidriyo: Yeah, I think the one big thing that Michelangelo solved really well was being able to deploy a model and serve it at very high scale at very low latency. That was the first big problem, which Michael Angelo solved really well. And that kind of became the primary reason for data scientists to come to the platform in the first place. So when the whole system started you're right, that the whole concept of feature stores per se was evangelized eventually later, but the problem has it's a software engineering problem. So it's kind of existed for a while. So before even the words, feature stores were formalized at Uber, the infrastructure still existed. And part of that infrastructure solved this problem of deploying models at low latency really well, to the point where data scientists would bring their custom trained models, trained outside the Michelangelo platform. And they would just hand it over to the team and just say, Hey, just deploy this for me. And I don't want to solve the scale problem. It's too hard. And I just want to focus on my modelling and I want to solve this use case, but I trust the platform because it's able to achieve P 99 latencies of single digit milliseconds at extremely high throughputs. So that was one big problem, which solved, which was solved very well at Michelangelo. And eventually of course, the platform itself evolved to automatically train the model and automatically deployed to production. So it sort of started with this service, which was able to manage and serve model that scale to eventually sort of standing out and becoming an end to end ML platform. But that was one thing that Michelangelo solved really well at the onset.

Simba: Yeah. It's super interesting because like now when I go into companies, I see there's two types of problems. There's kind of the processing problem in the organization problem. And it feels like most people who build feature stores, they hit the processing problem. Cause then it's like, that's an impasse you have to solve the like, Hey, we need this thing in production. It needs to have this latency. We need to like have a way to solve it where the organizational problems of like management, versioning, all that stuff. We see people kind of just try to like duct tape around and just kind of, sort of make it up as they go. And those tend to be kind of the other set of problems that feature stores solve. And yeah, it's interesting that, the processing one, I think is a really crucial one and it makes a lot of sense, but Uber would be so good at, at solving that just because there's streaming first almost from the get go and just what it does. What do you think you would have Michael Joe's features for you differently? In retrospect?

Atinidriyo: Yeah, I think in hindsight, one thing we should have done differently or I would've done differently was sort of thinking about the importance of data quality and it might come across as I'm me being biased because I'm working on data quality now, but it truly showed this problem was highlighted quite a bit when I sort of spent the last year or so of my stint at Uber leading data quality for machine learning efforts there. And from a lot of the tooling that we had built and sprinkled into the Michelangelo platform, which asserted data quality, it showed immediate improvements in the performance and stability of thousands of models, which was powering Uber's applications. So a lot of the learnings that came out of that have sort of taken and decided to build Galileo and it's really sort of give the power to data scientists to be able to ensure that they're training and evaluating their models on the highest quality data and giving data scientists the tools to quickly inspect and analyse and fix the data so that they can punch out a high quality machine learning model in very less time.

Simba: Got it. So I think a lot of lops use cases or problem sets that we saw, most people building start-ups, that's a problem you see and you go solve it and it's almost best when you that's a problem that you yourself have either had to solve or, or face. I know that's true of us. And it sounds like it's true of you all, I guess. So data quality makes sense. You want your data to be high quality that's I think, I don’t know if anyone's going to argue about that, but I guess I'm trying to visualize what you mean about these data quality tools. Like how does it look if I'm a data scientist, I might be doing some experimentation. I might not just fit, there are gaps in my data, I guess I'm trying to, to see how it's almost like a new category and there's a lot of categories being created and, and kind of merged at all times. So I'm just trying to understand like how you define this category. What does it do?

Atinidriyo: Yeah, no, I agree that the term data quality is so broad. It can encapsulate any and everything on earth. I think one aspect of it is sort of bringing the discipline of say unit and integration, testing to models in a way where you can literally have, a unit test, for example, that asserts that the number of misclassifications in your test set should not be greater than 10. And if that is true, then you fail the pipeline. That's the simplest sort of data quality test that you can write. So that's one aspect there's other aspects to data quality as well say before you even train your model, you have, holds of data in your warehouse. And the first question you ask is what data should I train my model on? So choosing the right data, and that also includes the quality of the data in the sense that there's less null values or, there's less garbage in the data, but also choosing a minimal set of the data to achieve the same model performance. For example, you don't need to train your model on a hundred thousand data points when you can achieve literally the same amount of evaluation or test performance with 3000 data points. But you have to be very scientific in thinking about what data you want to train. So this is another aspect of data quality. I think the third aspect of data quality is once you train your model, your data set fits differently with different model architectures. And the most typically practice, which is followed by data scientists is to tune hyper parameters, change the model architecture, introduce, new hidden layers or changing the depth of your tree and then retraining the model and seeing how the data fit and this whole process goes on and on. But one of the realizations more recently has been that you really need to shine a light on the data to be able to see a certain regions of your data set, where the model is underperforming and giving you the tools to be able to automatically sort of figure this thing out. Instead of this being like a manual process in a lot of companies we've seen including very like tech first companies, data quality assertion, post training is almost like this detective work, especially for unstructured data where you literally take your data set and you put it in an Excel sheet and you try to figure out patterns, look at model confidence scores, and you try to see where the model did well, where it didn't do well. And then you move to your next iteration of training either by removing a certain set of your features or adding new data to your data set. And this process goes on and on and it takes months because the whole process of inspection is very manual. So providing tools to be able to automatically infer regions of model under performance or regions where the model performed well. So, you don't need to worry about those regions. So that's the third aspect of data quality and the fourth and final aspect of data quality is of course, once you deploy your model into production and the data changes over time, it drifts and the model starts performing differently. So really shining a light on, on monitoring, not only the predictions, but also feature drift and correlating feature drift with feature importance and having proper alerting and monitoring around that. So these, those are sort of the four big pillars of data quality, which I have seen in my experience. But as you said, it's sort of this very new evolving field and it's always very exciting to see people discover new aspects of data quality and learn from others who are in the community.

Simba: Yeah, it's a really interesting way to break it down. And the first pillar, the first and fourth pillar have become a little more accepted in the lop space. The fourth pillar being monitoring feature drift, pillar of people talk about feature drift the first one being kind of the unit testing of data pipelines and your data where, your things like reputation. Like this is not a new concept, but obviously it's part of a bigger picture. I think the second and third piece are very interesting. And even though we did it ourselves, we were doing recommender systems again. So it was very important for us to figure out which subset of our users were we not doing well with because recommended systems is a great problem. It's kind of, you might have something that does better as a whole, but it gives a certain subset of users, such a bad experience, and that's almost worse, but doing a little worse and giving everyone a, a consistent experience. So it's interesting to talk about, it's almost like the data set optimization. It's almost like being able to, it's less the cleaning and less the monitoring, but second, third pillar is much more about picking the right features, picking the right rows to train on handling. Even like with drift, like one way I've seen people solve drift is just cut lots of old data. Like after the pandemic hit, I saw this trick where a lot of people were just like over sampling post pandemic data to try to kind of fix the weird distributions they were seeing.

Atinidriyo: No, totally. I think you've kind of nailed it. I mean, of course you are also very, very involved in this space, so it's sort of good to get your experience as well, but overall I think you pretty much nailed it.

Simba: Okay. So, if I'm looking at the data quality platform, there's a million people, different parts of the lop pipeline. I can't do everything at once. How should I be thinking about this? Why should I be choosing to bring this into kind of my initial or midway done ML ops platform?

Atinidriyo: Yeah, absolutely. I think the answer is simple as well as complicated. I think the simple answer is that it's a no brainer that the data is the lifeblood of your models. It's garbage in garbage out. So, if there's no disciplined way of thinking about what data you're training on, it is almost a certainty that at some point your deployed model will perform badly and it won't give you the results, which you desire and the increasing adoption of machine learning and business critical use cases, a bad model can have catastrophe business outcomes. So therefore the discipline of data quality baking in the discipline of data, quality, I think is super critical to ensuring that you don't perform embarrassing mistakes in your production environment. There's been many cases time and again, where a lot of big enterprise, they see models as this voodoo black box or even machine learning for that matter as a black box, which they don't want to adopt because there's high risk in deploying a model for critical business use cases. But from my experience in building these systems for many, many years, I have seen that almost always the issue is with the data and tooling like data quality can always ensure that your models are robust, they're compliant, they're fair, they're unbiased. And that's the only way you can sort of deploy your models with confidence.

Simba: Cool. So I guess one thing I'm trying to understand is, and I think I got it, but just to make it super clear, very, when people think about data quality or when I do, at least the things that come to mind are things like great expectations, like assert based kind of almost like blocks in your pipeline to make sure that data's coming in and being fitting your expectations, and it sounds like what you do is much more than that, and I guess I want to understand, what's the difference between how you're thinking about data quality and Galileo and something like great expectations.

Atinidriyo: No, that's a great question. So great expectations is a really good tool for ensuring there's a robustness in the data or the data set itself by it's almost like unit testing, your data set where you can programmatically have assertions and expectations of the range of values you expect in a column, for example, but I think with ML data quality, it's a lot more than that, I think with Galileo, for example, we are trying to gauge into how well a model would perform on say, older patients who are female who live in California, so that's a more higher level of an abstraction, and these kind of questions can only be answered once you train your model with the data set, and it's really about shining the light on how the model performed the data as opposed to testing a data set on its own. So I think that is one of the fundamental differences between a system like Galileo and a system like great expectations.

Simba: Got it. So it's almost like you, you talked about that experimentation thing that sits on top and you're kind of fit more into you tie into that a bit. You're kind of at the higher level of abstraction, you mark like great expectations is almost something you could argue as just like a data ops tool where what you're doing is very much clearly MLOps, like it's oriented their own models and machine learning.

Atinidriyo: Absolutely ML data quality is almost a separate discipline of data quality. It's a subset of the larger data quality system, and I think both are extremely important just in general, there's a lot of low hanging fruit, which you can sort of get from a general data set by applying a system like great expectations, but beyond that, once you train your model on that data set, there's a lot of insights which can come out of the way the model will fit the data, and you can leverage these metrics, which are flowing through the guts of the model to be able to infer a lot more, very interesting nuanced model centric insights, and that's a big part of machine learning data quality.

Simba: That's awesome. That's an awesome way to kind of think about and break down the two categories or subcategories of this kind of very large overlying data quality umbrella, and there's so many things that we've covered and so many more things I wish we could talk about, I know we've kind of been going for a little 45 minutes now, so thank you so much for dropping on, it's always great being able to chat with you and get your insight on the space as it's growing and yeah, thanks for answering all my questions.

Atinidriyo: Oh, thanks so much for having me, it was pleasure talking to you.

‍

Uber's Feature Store and Data Quality with Atindriyo Sanyal

MLOps Weekly Podcast

Related Listening

MLOps and Feature Stores in 2025 with Ben Epstein

Bridging Software Engineering and MLOps with Paul lusztin of Decoding ML

From Recession to Al Boom: Venture Capital Perspectives with Gautam Krishnamurthi

Building the Future of ML Platforms with Ketan Umare

Ready to get started?

PRODUCT

RESOURCES

COMPANY

PRICING

DOCS