Bridging the Data Gap in Policymaking: The Role of AI and Alternative Data

Governments are far behind in the race to leverage big data. Can generative AI help level the playing field?

By Hugo Z - Published on 10th January 2023
read more button

In the digital age, governments are no longer the only entities collecting data on citizens.

More alarmingly, private companies have become more efficient than administrations at gathering information. Traditional governmental methods – telephone surveys, forms and in-person interviews – continue to be the benchmark in accuracy, yet their inherent inefficiencies are becoming more apparent. In a typical OECD country, over 3,000 employees work in the National Statistics Department, undertaking the heavy lifting of government data collection.

Costly and inefficient cycles that span years from inception to completion stand in stark contrast to the real-time analytics deployed by tech giants. Their platforms are capable of monitoring millions of users daily, highlighting the growing need for faster and leaner techniques to complement traditional data collection approaches.

The company behind Facebook and Instagram, Meta, has collected data on 77% of all internet users with about 3.9 billion people active on at least one of its platforms (as of Q3 2023 Statista).
Number of users in billions

i Facebook alone had 3 bn active monthly users as of Q3 2023.

The issue is not limited to collection methodologies. National statistics departments struggle to provide the granular, timely data required for modern policymaking. In emerging countries, the data itself is often incomplete: the informal economy, which accounts for 15% of the OECD’s GDP, remains largely uncharted by conventional methods, highlighting a gap in government data.

The Middle East mirrors this global trend of data disparities, grappling with the challenge of providing updated and granular data of quality.

The availability of government data in the Middle East and North Africa (MENA) region is 44% lower than the global average, and 54% lower than that of the G20, signaling a significant lag in data transparency and accessibility. Only about half the countries in the MENA region are up to date with their health, labour force, and consumption surveys (Data & Policy, 2022).

In the Middle East, Governments do not provide quality data...

Open Data Barometer Global Report

G-20 average
56.8
World Average
32.5
UAE
26.2
MENA
18.2

> Data availability score of UAE is 54% lower than world average

> Data availability score of MENA is 44% lower than world average

...and neither do businesses...

Business Data Transparency Index

Number of users in billions
#0
UAE Rank

At the same time, policymaking is increasingly granular (See Whiteshield’s Citizen Navigator). As governments create dynamic policies that are tailored to individual citizens, the need for timely and detailed insights becomes critical. To navigate this new terrain, governments must evolve by adopting big data and AI-driven analytics.

Whiteshield’s AI Economics Unit is at the forefront of integrating advanced AI capabilities into practical applications. Our latest research showcases the remarkable potential of new AI tools to support data collection for regional economic assessment.

Transforming Satellite Imagery into Economic Insights

In assessing a region’s economic health, the conventional barometer has been its Gross Regional Product (GRP), similar to GDP on a more localised scale. This measure is central to policymakers’ ability to formulate nuanced, region-specific strategies. But in many emerging countries, where robust data is still often scarce, Whiteshield has turned to innovative approaches, notably the use of satellite imagery, to measure regional economic performance.

To achieve this, the process involves breaking down satellite images into smaller, manageable pieces. These pieces are then analysed by artificial intelligence algorithms. Specifically, the current state-of-the-art is a type of machine learning known as Convolutional Neural Networks (CNNs). These networks require humans to first prepare extensive sets of example images, each one meticulously categorized by hand, to teach the system how to recognize various features. This process is both resource-intensive and technically challenging, particularly in developing countries with unique landscapes, such as those in the Middle East.

The Power of Language Models in Regional Analysis

Whiteshield is leading the way with an innovative solution in this field. The AI Economic Analysis Tool uses Large Language AI Models, such as OpenAI’s GPT-4, to analyse satellite images. This approach is gaining recognition for its potential versatility across different fields of socio-economic research, in contrast with current expert systems which are tailored for specific tasks.

In a study examining 72 detailed satellite images of Dubai, the two different methods were compared for accuracy. The first method, a Convolutional Neural Network (CNN), was trained using images that had been pre-labelled by humans.

After several days of training, this method made errors amounting to 7.32% on average. In contrast, a newer approach using the AI Satellite Economic Analysis Tool and GPT-4, which did not rely on any pre-labelled images and took only minutes to run, achieved an error rate of 11.9%.

While slightly less accurate overall, Whiteshield’s approach showed superior performance at identifying buildings and roads, which are key elements in assessing economic health.

The Whiteshield approach outperforms the current method in building and road identification.

Dubai dataset experiment - Model performance by category

Average Error

Current method (CNN)

WhiteShield approach (LLM)

High-definition satellite images are broken into smaller tiles

satellite image 1

satellite image 1

The AI model segments hat it sees on each tile

Index1234
Longitude53.7553.7653.7753.78
Latitude23.3823.3823.3823.38
Crops0.80.20.40.2
Cattle00.400.6
Greenhouse0.40.10.70.2

The scoring formula is defined with weights

Agriculture intensity =

w1 (40%) x Crops Score

+ w2 (40%) x Built-up Area Score

+ w3 (20%) x Greenhouse Presence

Simplifying Complexity

What are the implications for governments and policymakers? Whiteshield's AI Economic Analysis Tool simplifies the intricate economic analysis process. Our remote sensing capabilities offer an efficient means of generating high-level analysis on specific economic measures in a matter of days only.

In the example shown, Whiteshield was asked to quickly respond to an agriculture intensity assessment task, which would have required the development of a model especially trained for desertic crops with the traditional method . With the AI Economic Analysis Tool, the dataset was loaded and a scoring formula was defined with relevant stakeholders, allowing to deliver a precise sectoral estimate for an area of 4000 km2 in 48h.

High-definition satellite images are broken into smaller tiles

satellite image 1

satellite image 2

The AI model segments hat it sees on each tile

Index1234
Longitude53.7553.7653.7753.78
Latitude23.3823.3823.3823.38
Crops0.80.20.40.2
Cattle00.400.6
Greenhouse0.40.10.70.2

The scoring formula is defined with weights

Agriculture intensity =

w1 (40%) x Crops Score

+ w2 (40%) x Built-up Area Score

+ w3 (20%) x Greenhouse Presence

Continued research suggests an optimistic future where AI, especially LLMs like GPT-4, will transform the way policies are developed. Alternative sources will supplement, and perhaps replace the current survey-based approach.


Whiteshield is providing a concrete example of the tools to come: the AI Economics Unit is currently developing new capabilities for the AI Economic Analysis Tool - such as cell tower activation, night light intensity, real estate prices, etc. - to provide policymakers with near real-time analysis specific to their sector and at an unprecedented level of detail.


For more information about our AI solutions, please contact us at Whiteshield.