Big data, such a hot keyword in the market, recently there is a press release in digitaljourna.com about Big Data market size:
Big Data and Data Engineering Services Market Will Register a CAGR of +17% by 2023 — Technological Advancements, Global Innovations, Competitive Analysis by Focusing Top Companies.
This means most of the big companies and the traditional/small/medium company will start or already plan to join the Big Data World. You, as a decision-maker or an enterprise architect, when you start planning the company’s big data journey, you will probably be confused by all the big data keywords and technology BUZZ.
And You must remember, you are driving a big boat and you are trying to turn this ship around. :)
So where to start? In this article, I want to share some of my experiences and thoughts.
1. Get the baseline, know where you are right now.
When we have a Production incident, the first question I will ask, please show me the error logs. Why? Because the key is to know where is the problem.
The same logic here, we have to know what’s our current stage, which means, for your current Data warehouse/technology:
- How many percent of reports are using batch?
- How many percent of reports are using JIT(just in time) or Real-time.?
Depends on the team’s current technical skillset, you could go even more profound:
- How many reports for each Business Unit have? How frequent are those reports being produced? Are they need Real-time?
- What’s your integration/ingestion pattern?
- How many data pipelines do you have currently? What technology is those pipeline using now? What’s the support structure look like? How many people supporting now?
This information will set up a good baseline on your company’s current stage. And you will have a more clear picture of where you are.
In this phase, the challenge will be getting the information from each business unit. Because most of the time they don’t even know how many reports they really need and what story they have right now. This means this is an excellent exercise for both IT and the Business, and please remember, This, its time-consuming tasks, however its a MUST to have activities.
2. Have an idea (high-level), where you want to go next.
Once you have an idea where you are, it will be more clear for you, where you want to go? And what’s your company’s Data Strategy Target End State. (such as in your company you will have: 20% static reporting and 80% real-time/JIT reporting, and machine learning capability in 50% of the business unit. etc.)
Why do we need to define this now? Because the Data Strategy Target End State will impact your direction of the IT/Data build roadmap, which included: infrastructure build, the team builds, training program we need for the current existing team (Development and Support), and HR hiring focus, etc. See, without the Data Strategy Target End State. It will be difficult for the different functional teams run forward in the same direction.
3. Make a plan and measure the outcome by monthly.
Depends on your company’s current project delivery methodology, Agile, Lean or Waterfall etc, you probably have different planning.
However, I don’t want to discuss the Agile, Lean or Waterfall and try to compare of them here, whatever the methodology you are using, my assumption is you are using the best one fits your organization.
Here, I would like to focus on the measurement, because whatever the methodology has the same common at the end, they need a valuable outcome, need the business value. So how we translate those technical deliverables to business value outcome, its the key, which let you get the current progress metrics, and what’s the challenge your team is currently facing.
And the reason for monthly, its because the monthly period will have enough time to capture the progress and the status of the transformation.
4. Build THE TEAM.
Today’s IT change so fast, how the company can not only keep the lights on, but also make sure they are not fall behind the new technology. The key is the TEAM.
Every day, in your email inbox always have some vendor talk about how they or their product can help and speed up your data lake journey.
Regardless this is true or not. Or you decide Buy VS Build. The most important thing is NOT the TOOLs, its the TEAM.
let’s say you decide to buy, so you need a TEAM to valid the TOOLs with the POC. Or you decide to build on your own, you still need a TEAM. So what’s this Team should look like:
Passion, Dedication & Determination
5. Pick a Business Unit, build an End to End Data pipeline by using the Big Data Stacks.
One of my favorite Kid’s story is: How a Colt Crossed the River. It tells a story about a little colt took a bag of wheat to the mill.
As he was running with the bag on his back, he came to a small river.
His friend Cow tells him the river is not deep, he can easily cross it. The squirrel stops him and tells him it’s really dangerous to do so.
Not knowing what to do, the colt went home and told his mom what happen on the way. His mother told him, “My son, don’t always listen to others. You’d better go and try yourself. Then you’ll know what to do.”
Finally, he crossed the river carefully with success.
Real knowledge comes from practice.
Pick a Business Unit and select a real business use case, then build an end to end data pipeline and you will have a good idea about how big is the scope of Big Data in your organization.
Those 5 steps could help you start the Big Data journey. And I guarantee you the journey is NOT easy, however, the right people, the right team, can always lead your company to the healthy and valuable Big Data World.
Disclaimer: This is only my personal opinion, and I will be more than happy to discuss the different thinkings.
Think and Learn, only make us smart. Dedicated Hard Working could make us success.:)
Thank you for reading.
My other Articles relative with Big Data is here: