Big Data Insights with Tableau + Apache Spark – The Aadhaar Story

Click here for an interactive version of the visualization on Tableau Public

CTR’s Big Data solutions help you analyze & visualize billions of records using Tableau & Apache Spark. Here’s a real world big data visualization & data discovery case study by our team

What is Aadhaar?

The Unique Identification Authority of India (UIDAI) is a central government agency of India.Its objective is to collect the biometric and demographic data of residents, store them in a centralized database, and issue a 12-digit unique identity number called Aadhaar to each resident.

About the Dataset

Aadhaar data catalog is made publicly available by the UIDAI to help with research and building application on the data which is collected at national level. The raw datasets (to-date) are available in via API. Past 30 days data is also available directly for the UIDAI’s website.

The Technology

  • Python scripts were used to directly pull and concatenate the data from UIDAI datastore using APIs provided.
  • The data was mined in a raw format into HDFS system again via Python scripts which were then loaded into Apache Hive.
  • Due to it’s sheer size and volume (exceed 80 GB in raw format) , A 5 node Spark cluster was setup on AWS to aggregate and summarize the data.
  • Ad-Hoc analysis was done by directly connecting Tableau to the Spark Cluster.
  • Finally the aggregated data was connected Tableau for visual analytics.

Data Insights

  • The northern state of Uttar Pradesh leading the number of Aadhaar generated followed by Maharashtra, West Bengal and Bihar. Though these are not necessarily the largest states by area, they are definitely the largest by population.
  • The age group of 20-35 has the highest number of Aadhaar generation with 30 yr olds making up the largest chunk of Aadhaar holders across the country.
  • The gender composition among Aadhaar is mostly even which is nice. Transgenders make a relatively negligible minority which can probably be attributed the still evolving social economic society.

Some interesting Insights

  • The newly created state of Telangana (in 2014) has the maximum number of Aadhaar holders among infants 1 yr old and under.
  • Females have surpassed Males in the two southern states of Tamil Nadu and Kerala.
  • The Aadhaar generation practically stopped during 2013 in the island state of Andaman & Nicobar.
  • Similarly the eastern states of Mizoram and Arunachal Pradesh practically did not start the Aadhaar until 2013

Additional Insights

Within the UIDAI datasets are additional about mobile phones and email ids that citizens have volountrly included. A quick visualization that tell you about the mobile and internet penetration across India.

  • The top three states with mobile penetration are Uttar Pradesh,Maharashtra and Tamil Nadu.
  • The top three states with internet penetration are Bihar,Maharashtra and Gujrat. Maharashtra is leading in both mobile and internet usage and in Bihar (the only state) internet usage among females is significantly higher than among males.

The island state of Lakshawadeep has the lowest mobile and internet usage


Connect with a consultant to discuss how we can customize a similar solution for your company's specific challenges. Please contact Dick Kenney at 714- 912-9719 or fill out the contact form below.

«« BUILDING SCALABLE DATA ANALYTICS ON THE CLOUDBetter Your Business with Business Intelligence, Blog #1 in the Series »»