Organisations increasingly realise the need to unlock the value of the large amounts of data in their arsenal to generate and transform them into business insights.
Next, comes finding the data management and engineering skills. This has led to the rise of the data scientist, a highly sought-after professional who has the data expertise to help companies in reimagining their business processes.
A common misconception about the role is that we only need to find someone who has a specific skillset or data analysis methodology.
To learn more, iTNews Asia speaks to Geoff Soon, Managing Director of South Asia at Snowflake to find out what to look out for in the data scientist, what prevents them from delivering the insights we want and how we can best extract maximum value from the function.
iTNews Asia: How important is the role of the data scientist now – especially in the current context of our post pandemic landscape?
The function of data science is more critical now. As we move to digital transformation, data science becomes critical, because it breaks down more complex data and introduces more awareness around the information gathered.
Now, I want to say something a bit contentious. I think people get confused between the role of data science and the function of data science. When you look at the role of data science, you're trying to find an individual that has the perfect mix of industry expertise, a background in mathematics and statistics, and the ability to programme.
With that, there is a massive shortage of data scientists – they are expensive, and it is not realistic to expect all types of organisations to procure these people. Rather, what we've seen a lot more happening is the function of data science, where an existing organisation builds a team of people that will deliver the necessary data science outcomes.
iTNews Asia: What are the key skill sets needed then in terms of the technical and soft skills for data scientists today?
Primarily, before looking at the skill sets needed, an organisation needs to have the necessary foundation before it can get a data scientist or build the function. There's two parts to consider.
The first part is their policy in governance. With increasingly strict regulations, an organisation needs to be clearly defined on what data is it going to be made available before applying data science. Next, the organisation would need to have a technology platform that makes it super simple to aggregate all these different sources of data, and then to apply the policy and governance on top of it.
In terms of skill sets, the most important skill set is business domain knowledge. Ultimately, data science should be around solving critical problems, and creating new innovations and insights. It has to be business related. There's no point building a model that is 99.999% accurate, that predicts nothing useful.
It is important to have the ability to define the problem statement and identifying what are the benefits that will flow from solving that problem.
iTNews Asia: How do you think the role of the data scientist has changed over the years, and what are the top challenges faced in terms of the workflow?
With any new field or domain – and data science is a relatively new field domain – you go through a little bit of a hype cycle. At first, there's an incredible amount of investment and excitement. Then, you get a little bit of disillusionment.
Data science has moved from, a nice to have to something that is critical to the business. But there's no longer the belief that data science is a silver bullet that can totally transform a business. Data science should be part of the business's overall transformation strategy.
The challenges faced in data science can be split into 2 different topics. One is around the ethics of data science, and the other is more of a technology challenge.
When it comes to the ethics, organisations trying to create some ethical frameworks. At the end of the day, we can now probably do more than we should do by using data science to predict things that to an average consumer, could seem creepy. This makes having a very strong ethical framework for how we create and execute our data science models to be important.
The second thing is the technology needed to create these effective models. To having the workflow to productionise these models is something that we see very little of, but thankfully, there are small steps being taken towards doing that with regulatory governance.
For organisations to be able to adapt to making decisions that were not driven by a person but thought an AI or ML model is a very deep thing as well. How do we execute this framework that allows us to productionise our models, and once we productionise the models, how do we constantly monitor and manage them to make sure that they're delivering the right outcome.
iTNews Asia: Building on what you have mentioned about organisations needing the necessary foundation for their data science function is effective. Why are data scientists still stuck doing the data cleaning and preparation for the organisation when the expertise that they were brought in for was to uncover high value business insights from the large amounts of data generated from companies to reimagine their business processes?
Data is still very fragmented across an organisation. When data is fragmented, there are nuances with that data that require cleaning.
For example, within an organisation, there may be five or six copies of me. There could be a copy me from a savings perspective, and a copy of my data from my credit card perspective. I might even have an investment account. Each of these may store my name in a slightly different way.
There may be different ways that the data has been normalised. The biggest problem any data scientist faces is firstly, getting the permission. Secondly, it is taking the data from all the different systems that may store slightly different version of me then collating and normalising it all. Finally then the data scientists can finally analyse the data.
The siloes between data has been massive. Now we add the complexity that organisations no longer want to be simply utilising the data within their organisation. They may also want to interact with data from third party providers.
By building additional complex layers of data within the model, it could be inevitable that so much time is spent prepping, bringing the information all together, and then finally doing high value work.
iTNews Asia: Whose responsibility would it be then to ensure that the data is cleaned up before the data scientist comes in?
What we're seeing emerging quite rapidly is the role of the Chief Data Officer. We've seen many organisations bringing in this role, and these individuals have a responsibility to not only ensure that the data platforms are in place but that it is being governed in the right way.
When having conversations with the various regulatory bodies – in healthcare, finance, or government – we find out that they are making sure that the way that people within their organisation are using these datasets are consistent with the relevant policies out there.
iTNews Asia: Would you say that the role of a CDO is a new position?
It has been around for a while. I think it is emerging from just being a Chief Data Officer to a Chief Data and Analytics Officer, the CDAO. We are seeing organisations taking advantage of that, and COVID has increased the prominence of the role because of the need to have access to so many more diverse data sets to be able to run businesses effectively.
iTNews Asia: What can be done to help data scientists have more time to handle more complex data challenges and implement new technologies to bring the business forward?
The first thing that we are seeing is that many organisations are looking to upgrade, innovate, or implement organisation wide data platforms. That has been a significant step in reducing the access to the data.
The second thing would be that the need-to-know programming languages has been lessened by many of the next generation data science tools. That way, you take out one of the three pillars of a data scientist which is coding, statistics, and business domain knowledge.
To an extent, we could be seeing a lot of innovation in the next 12 to 24 months in ML Ops or Machine Learning Operations. Just like there is IT operations that kind of defines how I can deploy an application to production and manage it, ML Ops is going to help in terms of how I deploy a machine learning model into production, and monitor and manage it across its lifecycle.
iTNews Asia: Given the time needed to clean-up the data, would you say that data scientists right now would be able to help businesses gain more insights?
It varies significantly across industries. For example, with digital native companies, there was a time you would consistently gain spam email. But that isn’t the case now. That is a result of machine learning algorithms that are constantly being tuned to filter out spam messages.
For the digital native industry, we have all been benefiting from data science already. If you look at the online shopping experiences and the recommendations that we're getting, that's a testament to data science and machine learning models being live and in production.
It is up to organisations to truly identify the business problems and the business challenges they would like resolved.
iTNews Asia: Which industries or sectors would be lagging in terms of successfully incorporating data science? Would the government sector be the leading sector for the data scientist to excel in?
There's been a large uptake of data science because of the pandemic, and Singapore has been exceptional. When it comes to the planning of the island, data science has become a fabric of Singapore.
These digital native companies and the public sector have been able to integrate data science effectively. But for some, the most exciting thing is working at traditional organisations because there's so much to gain and there's so much to transform.
This could be very fulfilling, and we do see data scientists spread across all types of industries. It depends on what gets you excited. We definitely see data science being fairly vibrant, across all different segments.