According to research from Greenwich Associates, 72% of global buy-side firms say alternative data has enhanced their signal. With adoption of alternative data on the rise, stakeholders across financial services firms are rushing to get their hands on quality alternative data sources to help inform their investment decisions.
Wait... let’s back up. What exactly is alternative data? Alternative data refers to data that falls outside the realm of traditional sources that is used by investors to evaluate a company or investment. Examples include social media feeds, data from satellite and weather sensors, transaction data from credit and debit cards, email receipts, survey data, and more.
Behind the Growth of Alternative Data
Alternative datasets effectively complement traditional datasets by helping identify patterns and insights to shape investors’ day-to-day and more visionary investment strategies. Alternative data’s rapid ascension also stems from the following factors: an explosion of data as a whole and easier access to faster and cheaper on-demand auto-scale compute; increased research and adoption of machine learning and data science; and buy-side personas taking matters into their own hands due to a lack of personalization.
However, though, working with alternative data is not a turnkey initiative. Concerns exist around:
- Difficult procurement processes
- Lack of available time and resources to properly vet and evaluate the data
- Lack of executive buy-in
- Identifying talent that has the skills to effectively work with alternative datasets
- Using data to accurately explore and assess financial risk factors
Luckily, these problems can be mitigated with the proper expertise and technologies. By building a foundation for machine learning and data science that is fundamentally rooted in interpretable, white-box AI, financial services firms can effectively navigate the balance between interpretability and accuracy that comes with debugging black-box models and implementing white-box ones.
They can also put processes in place to examine alternative datasets in a way that incorporates testing and quality checks, promotes reuse, enables scalability along with governance, and allows individual data visualization and exploration. By documenting data sources with sensitive information, enforcing best practices, and limiting access to projects and data sources with sensitive information to the right people for the right reasons, organizations can effectively move away from the often indefensible, Wild West scenarios that come with the proliferation of spreadsheet “islands” and start vetting and governing the data that is being used and how processes are designed.
Further, by using an inclusive, collaborative data science and machine learning platform, investors can bring more seats to the table to ensure the right people — from data scientists to quants to analysts — are working together across various projects in a way that enhances productivity and allows visibility to those who need it.
With Dataiku’s visual AutoML functionality, users can automate the entire machine learning pipeline via rapid iteration on machine learning exercises. Data provenance and reproducibility is critical for organizations in the investment space, with requirements on how data is sourced, stored, shared, and used. With Dataiku, all work and changes are persisted automatically and bundles can be used to stash away snapshots of the project together with the frozen data for future recomputation of the tasks.
There is a need to properly assess assets across all types of risks, especially as risk drivers continue to emerge from more and more directions. Investors and banks alike can leverage all available sources of information, notably those deriving from alternative data, to successfully combine financial methodologies with data science techniques to ultimately find the right solutions.
While alternative data wields the power to wholly transform investment management in the coming years, hedge funds and asset managers need to incorporate these nontraditional data sources into their investment strategies in a way that is holistic (and therefore not siloed to one part of a model life cycle) and relevant to their unique use cases (by using technology that enhances cross-team collaboration, accuracy, and explainability).