STEP 1: Cleaning the data
All files were checked for nulls and duplicates, with duplicates (often with ‘unknown’ data) being removed. Data was reduced to a common date range across files and unnecessary columns removed.
NHS England is divided into 7 regions, made up of 42 Integrated Care Boards (ICBs), and 106 sub-ICBs. An additional file was sourced from NHS England online to show the number of registered patients at both ICB and sub-ICB levels. This ‘patient population’ data was added to the existing data files through appropriate merges.
Due to changes in name, some data was not merged, and this needed to be added through a manual mapping process.
STEP 2: Creating a ‘fair-share’ parameter
A central construct in the analysis was the calculation of a fair share ratio, applied consistently across all three appointment datasets.
fair_share_ratio = actual_appointments / ((population / total_population) × national_threshold)
Where: national_threshold = 1,200,000 appointments/day (the stated NHS England daily
capacity target); total_population = 61,469,262 (sum of all registered GP patients).
A ratio of 1.0 therefore indicates an ICB is delivering exactly its population-proportionate share of appointments.
This approach is justified over raw counts because of the significant variation in ICB and sub-ICBs (ranging from small rural sub-ICBs to large urban ones such as North East London). Without normalisation, high-volume ICBs would always appear ‘worst’ regardless of demand pressure.
Although three files shared an icb_ons_code, and had overlapping time periods, joining the files was rejected as they appeared to have very different data. While appointments_regional and national_categories showed all appointments, the actual_duration file only held details of appointments that actually took place. Dates and geographical data within the files also had different granularity. Hence, exploratory analysis was performed on each of the datasets individually.
The final file, from twitter was found to be too broad to be useful, with tweets about international events and not about GP appointments for NHS England. A sentiment analysis was used, but the recommendation is to encourage use of specific hashtags so appropriate data can be captured in the future.
STEP 3: Investigating Capacity
While at a national level, and over a weekly time period, it looked like there was capacity, when examined at regional level, on weekdays only, significant geographical variations appeared, showing some areas were already struggling with capacity issues.
For example, in Fig 1, the top sub-ICB, South Yorkshire is dealing with more patients than it should according to its patient population size on almost 9 out of every 10 working days. In comparison, the sub-ICBS at the bottom of the list never go above their fair share and probably do have capacity.
Fig 1: Top 10 sub-ICBs – % of days exceeding daily fair share

STEP 4: Insights
From the exploratory analysis, three elements stood out as worthy of presentation.
Firstly, the data captured was often of poor quality, with more than 40 million records missing duration information, and 39 million records uncategorised. All data analysis is based on the data held – so improving data quality moving forward is important. The analysis was able to pinpoint where data quality was particularly poor, so that intervention could be made. For example, in Fig 2 below, it can be seen that almost half of NHS Lancashire and South Cumbria records have data issues.
Fig 2: ICBs with Poor Data Quality

Secondly, 13 million appointments are lost due to patients not showing up (DNA). Finding ways to reduce missed appointments would increase capacity. Again, it can be seen that there are regional variations. In the final report, two suggestions were made for reducing DNAs. Firstly, encouraging telephone appointments where the DNA is less than half of face-to-face rates. Secondly, sending reminders for appointments made well in advance as the DNA rate for appointments made more than 28 days in advance is 9 times higher than for same-day appointments.
Fig 3: GP appointment DNA rates over time

Thirdly, continuing to build on the Extended Access Provision – that’s appointments in evenings and at weekends. Such appointments have been growing steadily (see Fig 4) – but still account for less than 1% of total appointments. There is great potential to increase capacity here in targeted areas. However, there are very significant regional differences (see Fig 5).
Fig 4: EAP appointments and share of all GP appointments over time

Fig 5: EAP capacity headroom – top 5 and bottom 4 ICBs

STEP 5: Impact
Arguably the most important part of the analysis – recommendations for the NHS:
- Improve data capture – especially in areas where this is a significant issue.
- Reduce DNA through encouraging telephone appointments and sending reminders for appointments booked in advance, potentially reducing DNA from 4.7% to just 4.0% would free over 2 million appointments and 400,000 GP hours.
- Continue to develop Extended Access Provision. Bringing all ICBs to a level of 35 appointments per 1000 patients (15 of the 42 ICBs are already at this level) would offer an additional 386000 appointments.
To view the Jupyter notebook file – please click here. To view the presentation – please click here.

