GovTech Singapore's Data Science and AI Division - DSAID - is building a ‘data infrastructure in a box’ to help unlock and enable agency-specific data projects.
The ‘data infrastructure in a box’ - or DIAB - was briefly featured during presentations at the STACK-X Data Science Connect 2022 conference run by GovTech earlier this month.
The idea behind DIAB is to capture smaller data use cases that can’t be enabled through the Government Data Architecture (GDA) - a whole-of-government secure data sharing platform launched back in 2019.
While more and more datasets are being progressively added to the GDA - where data users can access them centrally, knowing the data and lineage is clean and up-to-date - the intent of the GDA was to pool “core data… frequently used by multiple public agencies”.
DSAID senior director Jason See said that smaller, single agency-specific datasets and use cases typically sat outside of what the GDA enabled.
“From a government data architecture perspective, there are use cases outside, especially for agency-specific ones, where we do need to pay attention to see how to help the agency for some of these smaller use cases, even as we are actually trying to grow more datasets within the GDA,” See told the conference.
“Data infra in a box [DIAB] tries to do exactly that. We’re trying to develop DIAB to help agencies, especially low maturity agencies, set up their secured data infrastructure, and our vision is to do it in days and not months.
“We do this by giving [DIAB] baked-in best practice security and compliance configurations and also secure integration with the central tools [that are available through the GDA].
“We do this by leveraging IAC [infrastructure-as-code] templates using Terraform, and this works within the GCC [Government Commercial Cloud] environment.
“We try to use cloud native tools where possible but we're also open to exploring external tools if required.”
See said DIAB is still a “work-in-progress” and said that more details would be shared “when ready”.
DSAID’s other future directions
In addition to DIAB, See showed off several other immediate future directions that DSAID is focused on.
He said that the division is in the process of developing a data sharing platform “to better enable private-public data sharing.”
“[We’re] very excited by this,” he said.
The platform will embed “necessary anonymisation and data security controls” and also meet “the necessary data governance processes and structures” previously set out by the government.
“We're already piloting trials with MOH [Ministry of Health] and IHIS [Integrated Health Information Systems], and we'll share more details when ready,” See said.
He also said that a data privacy protection capability centre is being set up. Through this effort, DSAID will “look at developing central tools to better help agencies with high risk data sets.”
In addition, See said that DSAID is statting to experiment with six “forward deployed teams” of data scientists that are being deployed out to agencies.
“In a sense, it's almost like a DSAID hub-and-spoke kind of organising model,” See said.
“We look to progressively grow this capacity and capability over the next three-to-five years.”
Hitting scale
When DSAID first started six years ago, it comprised a team of around 20 data scientists.
Today, the number of data scientists alone is up to “60 or 70”, and with AI and data engineers as well as product development personnel, the division has at least 120 people - no exact figure was put forward.
See said that DSAID has four major focus areas that help it achieve its broader role as “a capability centre to help uplift our government's data science and AI capabilities for public good.”
“As a capability centre there are four things that we do, in very simple terms,” he said.
“One is we partner agencies to deliver business value with impactful data science and AI projects.
“Second, we help to scale these efforts really with the implementation of central products and platforms.
“Third, we want to support whole-of-government capability building efforts with the upskilling of public officers as well as uplifting the capabilities of the public agencies themselves.
“Fourth, we also need to do well as a functional leader so we continue to deepen our expertise in core domain areas. We also push white space innovation just to stay abreast of tech developments.”
See added that future efforts would focus on how to expand partnerships and collaboration “with the external community”.
This was reinforced by government chief digital technology officer Chan Cheow Hoe, who also talked about DSAID’s scale-up challenges.
“Now, we reach a point whereby the government isn't the only source of truth - the government doesn't have everything,” he said.
“How about the ecosystem out there? How do we reach out to the rest of the industry to be able to embed data with the government so that we have enough data to make sense of everything that we do?
“This is really our journey of scaling - from growth, getting more and more data scientists and people within DSAID, to now leveraging very much on platforms and the ecosystem to allow us to do this in a much better way.”