Over the last few years, Big Data has been mentioned everywhere, and the involvement of Wall Street institutions in the deployment of Big Data solutions or in the financing of new ventures seems to be accelerating. But what exactly is Big Data in the context of Capital Markets?


What does Big Data mean for Investment Banks?

Big Data is usually defined by the 3 Vs (Gartner 2001):

  • Velocity (speed of data delivery and processing),
  • Volume (amount of data that must be managed or processed),
  • Variety (range of different data sets that must be dealt with, both structured and unstructured formats)

Another V has been added since then and is now a full component of Big Data: Veracity (integrity or quality of the data being processed or stored).

Investment banks have been dealing with huge volume of data for a long time (tick data for securities quote represent vast amount of data, well managed by structured databases), as well as with complex data (that has been happening for a long time in the OTC derivatives world). It is really the combination of all these factors that creates the big data reality for banks.

In response to a fast moving regulatory environment, large firms focus on trying to eliminate information silos, and this exercise turns out to be extremely challenging, as data encompasses heterogeneous asset class, product and risk information. Besides, information format is expanding, with the inclusion of text based unstructured data.

Investment banks have trailed consumer retailers in embracing big data. One of the main reason for that is certainly the complexity of their data systems architecture. However, many financial firms are now focusing on innovation to leverage revenue growth opportunities and operational efficiency offered by big data initiatives. To this end, many investment banks are actually looking to partner with specialist external firms rather than following their old business model of developing technology in-house.


Big data technologies lingo

  • Data grids

Distributed caching technology that enables to access, modify and transfer large volumes of data across a network of servers

  • Compute grids

Parallelized computation across multiple servers, handling capacity and failure issues and orchestrating tasks across the grid

  • Massively parallel processors

Coordinated processing of a programme between multiple independent computers, each with its own operating system and memory

  • In-memory databases

Databases that store data in the main memory (CPU) rather than on a disk. Result is a much faster access to information and processing time as physical interaction with a disk is removed.

  • NoSQL

Shell relational database management system that provides a mechanism for storage and retrieval of data which is modelled in means other than the tabular relations used in relational databases (SQL), and whose tables are compatible with a wide range of external platforms. These databases are able to scale horizontally, and can store whole documents.

  • Specialised databases

Databases created to meet some specific needs or containing specific data. Contain the necessary architecture to store unstructured data.

  • Hadoop

Tool used to query the unstructured data which is a major part of big data analytics. Enable data processing in parallel and storage of blocks of subdivided data across servers.


Main areas of implementation of Big Data in capital markets and Related Startups

Below are what we think are the four main applications of Big Data for Capital Markets institutions, as well as the Fintech involved:

1)   Revenue generation

– Behavioural analysis for trading strategies (scanner algorithms), and understanding of customer interactions, i.e. use the data provided by SDP: e-platforms represent a tremendous source of new information about clients, such as the types of transactions clients do or queries they run, or research they look at.

– Trading analytics: includes analytics for HFT, predictive analytics, pre-trade decision support analytics. Insights from market indicators, economic indicators, and sentiment analysis for stocks

and events may be used to enrich the information set used by traders and investors alike for making investment decisions.

  • XIgnite provides cloud-based financial market data APIs to help emerging companies and established enterprises deliver real-time and reference market data to their digital assets, such as websites and apps.
  • Quandl hosts data from hundreds of publishers on a single easy-to-use website. The service is designed to help data analysts save time, effort and money by delivering financial and economic data in the format they want: via their website, their APIs, or directly into dozens of tools. It provides a data acquisition and distribution platform that captures, normalizes and present non-traditional, unstructured data sets in a standardized format.
  • Dataminr uses a range of algorithms including AI to scan through the 500 million tweets daily and spot real-time news relevant to the user. This technology is used by news agency, public sector, as well as financial institutions, who can take action on early market moving information.
  • Heckyl Technologies is a real-time data analytics company that brings news, price, fundamental and portfolio analysis through a single platform that can be used by a researcher, trader or an analyst to get actionable insights from both unstructured and structured data. The platform brings real-time news, information and market data from companies, businesses and global markets around the world. Heckyl includes a sentiment- tagging, news-clustering and discovery engine that presents ready-to-use, actionable intelligence from markets using conventional and evolving datasets such as Open Databases and Social media.
  • EidoSearch is focusing on the predictive analytics segment, through comparison of patterns across asset classes using signal processing and content based search technologies, to assist asset managers make better decisions about entry and exit points. It also helps quantifying risk by predicting volatility and downside scenarios.

2)   Regulatory compliance

– Market surveillance and fraud detection: the ability to consume different channels and types of data – including instant messages, phone recordings, emails and internet content – and consolidate all this into usable database allows advanced pattern matching analytics to spot anomalous behaviour. It also facilitates AML and KYC processes.

– Regulatory reporting: big data enables the cross-referencing of key sets of internal data related to derivatives instruments in order to facilitate trade reconstruction and reporting (as illustrated by Dodd-Frank requirements). Compliance to a growing set of regulations (Dodd Frank, Solvency II, EMIR, audits…) adds more pressure on banks to develop sustainable long-term data management strategies.

  • Feedzai is a data science company that uses real-time, machine-based learning to help payment networks, banks and retailers prevent fraud in omnichannel commerce. Feedzai’s fraud science technology fuses machine learning with human intelligence to power payments systems globally for customers in North and South America, Europe, and Africa
  • Passfort is a web-based platform working with any browser, that allows financial providers to easily collect, verify and securely store and manage all their AML/KYC compliance data in the cloud.
  • Tradle is a blockchain based solution to improve and make more efficient compliance to KYC requirements.
  • Trulioo is a global ID verification company providing advanced analytics based on traditional information such as public records, credit files and government data as well as alternative sources including social login providers, ad networks, mobile applications, e-commerce websites and social networks. Trulioo specializes in scoring online identities as authentic, machine generated or fraudulent with its identity bureau covering 4 billion people in over 40 countries, including coverage for the most challenging demographics from emerging markets such as China, Russia, and Brazil.
  • FundApps makes technology and also partners with the content providers in the industry to deliver the frequent rules and updates for financial regulation. FundApps releases new versions of its software on a monthly basis and changes to rules are implemented and deployed to clients immediately.
  • OpusDatum provides solutions to financial institutions to help fight financial crime. By conducting forensic reviews of large-scale, global payment applications, OpusDatum is able to assess organizations’ compliance, risks and controls on topics such as AML, anti-bribery and financial fraud.

3)   Risk management

Big data introduces a revolution in risk management, as it allows the production and monitoring of real-time, on-demand performance metrics and risk measures across product lines. To be more precise, big data allows more consolidated views of risk, gives prediction capability for expected risk, more flexible tools to better match banks’ changing business environment, better reporting format (interactive, dynamic, leveraging data visualisation technologies), better allocation monitoring of Scarce Resources across region and business lines. Stress test becomes a powerful monitoring and business steering tool, all the more so as the real-time dimension allows optimal hedging and reduces associated costs. Also, big data is a key component of cyber security as it allows to enhance detection in unstructured networks.

  • Scaled Risk solves the growing needs for smart data processing in the capital market industry by providing a Big Data and in-memory analytics platform that assures real-time historical and live trade data analytics to help investment firms accomplish real-time enterprise-wide risk management and comply with current and future regulatory demands

4)   Cost reduction and operational efficiency

– The data aggregation process for ad-hoc reporting, to feed both internal and external reporting functions, is often painful and costly, and big data addresses this cost. For instance, matching / reconciliation of trades across various systems can result in operational risk of invalid, duplicated or failed trades. Data tagging allows to easily identify trades and events like corporate actions.

Other opportunities for big data include post trade analytics helping to evaluate key metrics like transaction costs or order execution performance.

– Similarly, maintenance (storage, handling and processing) and consolidation of data for various asset classes, product lines, service layers (FO, BO) and coming from various vendors, is extremely challenging, and common dictionaries to handle all this information are hard to find. This is solved by big data.

– Versioning and audit of trade transactions: structured databases require to keep all versions in the same table and flag latest version, with the associated difficulty to identify modified fields. With big data, cells are versioned (timestamp), and each revision can be retrieved easily by API.

  • Crowd Valley provides a Digital Back Office and public API that enables the transition from an offline investing or lending model to an online native application such as peer to peer investing and lending, real estate and alternative asset marketplaces for financial services professionals such as the World Economic Forum. It also provides integrated tools for data, analytics and compliance support. These tools are available also for mobile application developers and IOT applications.
  • Cazena is a fast and inexpensive processing of big data in an encrypted cloud via what it calls enterprise Big Data-as-a-Service offerings broken down into Data Lake, Data Mart and Sandbox editions. The company emerged from stealth mode in July of 2015.
  • Altiscale is a cloud service that is purpose-built to run Apache Hadoop. Altiscale’s optimized infrastructure is faster, more reliable, easier to use, and more affordable than alternatives. Altiscale’s founding team has been at the forefront of Apache Hadoop, from its incubation at Yahoo! to operating more than 40,000 Hadoop nodes. Altiscale is backed by General Catalyst and Sequoia Capital, with additional investment from Accel Partners, Jerry Yang, and other individual investors.
  • Velocimetrics traces data as it moves across multiple systems and processes. The technology is used to assess data quality, detect potential operational risk with fast trade reconstruction, and spot client experience or regulatory concerns.


Challenges for Big Data in Investment banking


1–   Data architecture

One key issue is the management of various data sources, involving many different technologies across several business lines. The burden of legacy infrastructure, coupled with the silo reality at most firms, makes it difficult to deploy efficiently meaningful big data solutions.


2–   Data Governance

Having a robust data governance framework in place is crucial to manage big data solutions deployment and ensure scalability, efficiency and security: data source, ownership, control, management, auditability… In the context of the infrastructure challenges mentioned above, a carefully designed data governance and thorough implementation are of paramount importance.


3–   Customer privacy

The use of big data is also associated with concerns around customer privacy protection. Because of potential discrimination or personal information breaches, big data usage can be challenged and restricted by privacy laws. Upcoming regulation implementation within the EU (“General Data Protection Regulation” and “Data Protection Directive”), as well as recent agreements between EU and the US (“Safe Harbour Agreement”) are initiating a framework around the questions of privacy and personal data protection, and will have a large impact on big data technology solutions.


4–   Business issue

Finally, one of the biggest hurdle for success of big data projects at banks is technical knowledge and implementation know-how. Data scientists for capital markets have to deal with analytics, but also provide consultancy services and have a deep knowledge of the firm processes, meaning their role is critical and their profiles hard to find. But most importantly, there is a need for cooperation between data scientists and the business in order to extract meaningful insights, which involves change management within the business itself.


Recent Big Data Trends and where it is heading to

1–   Startup market

The first-wave Big Data companies were founded between 2009 and 2013, and now start to reach full maturity, with robust range of products. Some actually went public, and others received sizeable financing: Big Data startups received $6.6 billion in VC capital in 2015.


2–   Product development

General focus is now moving away from infrastructure to analytics and applications to serve business users and consumers. One important area of development is visualization, to help present the analysis run by the big data engine.


  • Centrifuge systems is a startup specializing on big data discovery and analytics problems. Its key differentiated offering is a full suite of integrated visualizations tools. It also applies this proprietary technology to fraud detection.


The latest big trend is the increasing focus on artificial intelligence, to help analyse massive amounts of data and derive predictive insights. Big data, by making available massive amounts of data cheaply and quickly, is allowing deep learning algorithms, created decades ago, to live up to their full potential, and the combination of Big Data and Artificial Intelligence is very likely to deliver incredible innovations in the near future.


Final word

Looking at the potential offered to banks by big data to improve risk management, regulatory compliance, operational efficiency while at the same time opening new windows for revenues growth, there is absolutely no doubt that revolution in this field is only starting.

Given the strategic importance of the topic and the potential challenges arising when deploying related solutions, one of the key elements needed early in the process for banks looking to seize these opportunities, is a strategic plan around big data.

Scroll to Top