One of the fulfilled promises of artificial intelligence and machine learning is that these technologies make Big Data manageable. Big Data – the billions or trillions of records published daily – is beyond the scope of what humans can process. We would run out of lifetime before we read all the tweets, texts, and posts we create – and that’s before considering rich media like audio and video. AI/ML helps us to understand these massive datasets, digest them down to comprehensible summaries.
What if, however, we don’t face massive data in our daily work? Is there still benefit to using machine learning and artificial intelligence technologies if we have pedestrian datasets? I’ve heard marketers say things like:
- I’m not a big business. I don’t have Big Data. Why would I need AI?
- I’m not a tech company. I don’t see how AI would help.
- I don’t have a big IT budget. I can’t justify the cost of AI.
Are these statements true? Does AI have a role outside of Big Tech and Big Data?
Let’s consider what Big Data is, since AI and ML are designed to solve it.
The Four Vs of Big Data
Big Data is defined by many, including IBM, by four Vs:
Volume. Big Data is big, measured in terms like petabyte, exabyte, zettabyte, and brontobyte. The entire Library of Congress is approximately ten terabytes; one exabyte is 100,000 Libraries of Congress. We need tools like machine learning technologies to analyze vast amounts of data.
Velocity. Big Data happens fast. Data streams in at blistering speed. If you’ve ever watched a raw Twitter or Instagram feed, you’ve seen the velocity of Big Data – faster than the eye can see or read. We need tools like machine learning technologies to process data as it happens, no matter how fast it is.
Variety. Big Data encompasses many formats, from structured datasets like large SQL databases to unstructured data such as handwritten notes, scanned pages, audio files, and more. We need tools like machine learning technologies to process data in whatever format it’s in.
Veracity. Big Data is often of questionable quality. How reliable is the data? How clean is the data? How well-sourced is the data? We need tools like machine learning technologies to identify and clean anomalies in massive datasets.
AI, ML, and Small Data
While the four Vs define Big Data, they are not exclusive to Big Data.
Consider the volume of data in any small or midsize business. While eCommerce giants contend with millions of visitors and thousands of customers per day, a small business has the same scale problem – too much data per employee, especially if there’s only one employee.
Consider the velocity of data in any small or midsize business. Even a relatively slow trickle of data will still overwhelm a few people who have more work than time.
Consider the variety of data in any small or midsize business. A small business has little time to process and convert data in all its different formats, from XML to SQL to JSON.
Consider the veracity of data in any small or midsize business. The smaller the business, the smaller the datasets associated with it – and the greater the impact of anomalies or corrupted data. A few dozen incorrect records in a dataset of one million customers doesn’t matter much, statistically speaking. A few dozen incorrect records in a dataset of one hundred customers matters a great deal – but the small or midsize business may not have any way of detecting those errors except during infrequent audits.
The four Vs are problems for any business dealing with data, and that’s where machine learning and artificial intelligence shine. AI and ML – especially open-source technologies – accelerate the speed at which businesses of any size can solve the four Vs.
For example, suppose a business owner wants to understand the online reviews of their competitors. They could take a day to read through the reviews, but that’s a day spent not doing other work. By using topic modeling and text mining, they could have an answer in minutes, if not seconds, and change strategy in the same day.
Here’s a glimpse into a topic model for a competing company’s Glassdoor reviews (disclosure: not my employer). Instead of reading through every review, I can see the broad themes in the reviews and quickly ascertain what some of the issues at the company might be:
It might have taken hours or days to read through all the reviews, but instead machine learning technology reduced a few hundred reviews to an easy-to-understand table in seconds. A few hundred records is small data, but still larger than the amount of time I had to devote to the task.
The Power of AI and ML is Speed For Small Data
The true power of artificial intelligence and machine learning for small datasets is speed. We could handle small data manually, but if technology exists to process it at very high speed, why wouldn’t we use it? We might not win any high-tech innovation awards for reading customer or employee reviews faster, or managing social media more efficiently, but our real reward is more hours in the day to do work we enjoy.
No matter the size of your business, look into how AI and machine learning can help you convert hours of work into minutes, expanding the time you have available every day.
Listen to a machine read this post to you:
You might also enjoy:
- It's Okay to Not Be Okay Right Now
- What Content Marketing Analytics Really Measures
- Best Practices for Public Speaking Pages
- Is Social Listening Useful?
- What Is The Difference Between Analysis and Insight?
Want to read more like this from Christopher Penn? Get updates here:
Get your copy of AI For Marketers
What were the tools you used to create the glassdoor review demo?
I used Scrapy, a Python application, the Glassdoor API, some custom PHP code, and R/RStudio for the machine learning portion.