We're thrilled to announce the release of Featureform v0.6! This latest release has great additions and enhancements, including expanded Spark support, better insights into features and labels, and data backups.
We also want to give a big shout out to our community members and our customers for their ongoing support and feedback. We look forward to hearing your thoughts on v0.6. Feel free to drop us a line in our Slack community. Check out our full release notes here.
Up until now, Featureform has supported Spark on EMR and Spark on Databricks. With v.06, we’re excited to announce that we now fully support Spark as an offline store!
Description: Featureform now supports all variations of Spark using S3, GCS, Azure Blob Store, or HDFS as a file store.
Why is this important: You can now define new transformations via SQL or Spark Dataframes, register them as features or labels, and create training datasets with them or use them for inference.
Here are some examples:
You can check out additional examples and snippets here.
Understanding which data resources are used by a model is a key element of robust search and discovery. We heard from our community and customers and now provide a way to link registered resources to models.
Description: Data scientists can now link features, training sets, and labels are tied to which models by including an optional model parameter during registration.
Why is this important: Users can now look up models and understand which features and training sets are being served to them. This added functionality makes it easier for data scientists to build off their teammate's work and troubleshoot.
It can then be viewed via the CLI & the Dashboard
Check out an example snippet here.
We realize that data availability is important to both customers and open-source users. That’s why we’ve decided to make data backup and recovery available on open-source Featureform!
Description: Open-source users can now back up their data on Featureform and restore data from an existing backup.
Why is this important: Backups and recovery in Featureform help make your workflows are reliable and scalable. We believe that data availability is a requirement for a resilient workflow.
Prior to this release, if you were to rotate a key and/or change a credential you’d have to create a new provider. We made things immutable to avoid people accidentally overwriting each other's providers; however, this blocked the ability to rotate keys. Now, provider changes work as an upsert.
Description: You can update provider credentials without having to register a new provider.
Why is this important: This update enables users to rotate keys and update credentials for existing providers as needed, making provider registration more efficient as you scale.
For example if you had registered Databricks and applied it like this:
You could change it by simply changing the config and re-applying it.
One of the primary value propositions of a feature store is to power search and discovery to increase collaboration and reduce duplication. Prior to this release, you could only search resources from the dashboard. In v.06, we’ve added the same functionality to the CLI!
What does this mean: You can now search for resources directly from the CLI via the featureform search -q [query] in addition to the dashboard.
Why is this important: Our goal at Featureform is to make our users more productive, and that includes enabling folks to choose where they want to work. For our users that prefer to work in the CLI, this new feature allows them to search without leaving their current window.\
Featureform has historically made all resources immutable to solve a variety of different problems. This includes upstreams changing and breaking downstream. Over the next couple of releases, we expect to dramatically pull back on forcing immutability while still avoiding the most common types of problems.
Featureform apply now works as an upsert. For providers specifically, you can change most of their fields. This also makes it possible to rotate secrets and change credentials as outlined earlier in these release notes.
Older deployments of Snowflake used an Account Locator rather than an Organization/Account pair to connect. You can now use our register_snowflake_legacy method.
You can learn more in our docs.
Pandas on K8s is still an experimental feature that we’re continuing to expand on. You were previously able to specify container limits for all, but now for specifically heavy or light transformations you can get more granular about your specifications as follows:
From overviews to niche applications and everything in between, explore current discussion and commentary on feature management.