4 min read

How data lakes and machine learning can reveal valuable insights


Big data, data lakes, advanced analytics and machine learning (ML), are these tools only meant for tech giants like Google or Facebook? Or can these tools meaningfully assist any business and help it respond to the ever-changing needs of the market? As more and more organizations continue to migrate to cloud-based solutions, many now realize they sit on a gold mine of data. Although they understand the value of data, few know how to glean meaningful insights.

Every time one of your customers clicks on a link, completes a transaction or views a page, it becomes a valuable data point. Businesses that realize the potential of this information use a data lake to store it. To put it simply, a data lake is a database that contains huge amounts of raw transactional data. Data lakes often get confused with data warehouses because they are similar, but data lakes are much more versatile and can grow much larger than data warehouses. With software-as-a-service (SaaS) and connected on-premises solutions, data can be exported from the solution to a data lake, where the data is normalized. This enables you to apply analytics at a later date.

Data lakes collect large amounts of raw transactional data, but it is just that: raw data. Without applying analytics to that data, obtaining any useful information or action items from it is very difficult. Everyone knows what analytics is, but most only scratch the surface of what analytics can do for their business. For example, how many people look at a data lake and ask, “How many people logged in?” or “How many people clicked on a link after we deployed this new product or feature?” or “How many widgets were purchased after this marketing campaign?” It’s good to have this information for tactical needs, but it does not tell a story about how your business is doing. It does not provide insights into your customers’ and users’ activities and trends.

What makes things more difficult is analyzing data from disparate systems. Think about a typical e-commerce shop. It has multiple different backend systems all connected to create a complete solution: Customer-facing web portals

  1. Shopping carts
  2. Billing systems
  3. Enterprise resource planning systems
  4. Order and warehouse management systems
  5. Shipping systems
  6. Sales and contact management systems

Each one of these systems comes from a different provider, has its own database, methods, naming conventions and application programming interfaces (APIs). The data lake ingests all data from each system, then uses the data for various analytics.

It’s all too often the analytics applied to a data lake do not realize the true potential of the data, or the analytics provide information that is just plain inaccurate. As the renowned British economist Ronald H. Coase put it, “If you torture the data long enough, it will confess to anything.” Without the right intentions, data is used to prove or disprove the biases of a single person or team. These solutions provide no conclusions, next steps, or trends, and, most importantly, they reveal little to no insight into overall business trends.

The industry is moving in another direction and putting a new layer on top of an analytics engine: machine learning. Data scientists now use artificial intelligence (AI) engines that meticulously sift through each raw transaction in a data lake to look for crucial trends that lie either at the surface or deep within the data. These engines are capable of gathering, comparing and analyzing data from multiple disparate sources. These ML engines consume that data and ultimately deliver predictions and trends that provide answers to significant questions you didn’t even think of asking. Using AI algorithms, an ML engine even can alter its analysis and conclusions based on the changing trends it sees within a data set.

We all think about sales and marketing organizations applying ML to their data to ensure they pitch the right products at the right time to the right consumer. However, many industries now use ML and AI to solve very specific problems. The financial services industry applies ML to prevent fraud and reduce expenses related to it. Investment and stock brokerage firms also use AI to buy, sell, and suggest stock trends and offer insights to traders about when to enter and exit certain holdings. The healthcare industry has seen an explosion of data collection from wearable devices and is using ML to provide accurate, more targeted healthcare services to specific individuals. Even the oil and gas industry is applying ML and AI to data sets collected from mineral analysis to predict refinery failures or service degradations before they happen.

Ultimately, each industry and business strives to accomplish two things: identify profitable opportunities for growth and reduce or avoid risk. It is always a risk when businesses ask their data teams to display metrics that they “feel” is important to make educated business decisions. It might lead to missing significant trends and insights living deep within the data, and even glaring trends staring data analysts right in the face. By implementing ML on top of data lakes and existing analytics engines, you can gain insights from transactional data to help your business grow.