Finally, in this R post I use the prepared HESA financial data to reproduce, and analyse, the 2020-21 “surplus/deficit for the year” figures for the top five Scottish universities.

Introduction

Having imported and inspected the data in the first post, and wrangled it into a form suitable for analysis in the second, it’s time to actually do something with the data.

The analysis in this post has two main parts. Initially I’ll reproduce the “surplus/deficit for the year” figures for 2020–21 for the five universities. This, though, highlights a challenge: how do we compare the surplus/deficit figures of universities which may be vastly different in scale? In 2020-21, for example, the University of Edinburgh had around 2.7 times the number of undergraduate students as the University of St. Andrews, and this was reflected in income from fees.

This isn’t a question I can properly address in this blog (I’m also not a higher education finance professional, which doesn’t help). To make a comparison possible, though, I normalise the financial statistics for the universities by dividing the surplus/deficit of each university for the year by its total income. This is similar to the “net profit margin”, or net profit percentage, metric used to compare companies and provide some indication of their “financial health”.

There are some issues with using this kind of metric here. A major one is that is that universities are charities, not companies. The denominator for net profit percentage (NPP) is “revenue”, but what should this mean for universities? To approximate this, I’ll count as “income” the usual, operating, income received by Universities (e.g. from fees, and research funding) plus any additional net gain/loss from other activities used to calculate the surplus/deficit for the year (e.g. gain/loss on investments). These are the activities in the categories I grouped as “Other” at the end of the second post.

Combining usual, operating, income with net “Other” income may not reflect “standard practice” for analysing university finances. As I’ll show, though, some universities receive substantial income from these additional activities (e.g. this approached 7% of total income, understood this way, for University of Edinburgh in the 2020-21 academic year).

A related issue with this approach is that universities don’t have the profit-making goals of companies. Universities usually aim to have a surplus at the end of a year, as this can be used for investment purposes (e.g. in new services or facilities), but a deficit in a particular year isn’t necessarily a bad thing.1 Similarly, a low or zero (if there is a deficit) NPP for a university in a particular year isn’t necessarily a bad thing.

I don’t attempt to cash out (no pun intended) the analysis below in the form of a discussion of university finance in this blog. My main interest is in the data, and in manipulating and analysing it with R. Writing this blog, though, has left me with lots of questions, and I realise I’ve dipped my toes into a large, mostly unfamiliar, discipline with its own language.

Analysis

Preliminaries

I’ll start by importing the same packages as before. I’m also using ggplot2 this time, loaded as one of the tidyverse packages, and have set the theme for all plots in this post with the theme_set() function:

library(tidyverse) # import most of the packages needed
library(kableExtra) # for HTML tables
library(here) # to find absolute path to dataset
theme_set(theme_minimal()) # set ggplot theme

My starting point is the data frame uniFinData from the second blog, which (after importing and manipulating the dataset again in the background) looks like this for University of Aberdeen:

University Category Subcategory Value000s
The University of Aberdeen Income Donations and endowments 236
The University of Aberdeen Income Funding body grants 87577
The University of Aberdeen Income Investment income 2012
The University of Aberdeen Income Other income 25764
The University of Aberdeen Income Research grants and contracts 45927
The University of Aberdeen Income Tuition fees and education contracts 74423
The University of Aberdeen Expenditure Depreciation and amortisation -16855
The University of Aberdeen Expenditure Interest and other finance costs -5840
The University of Aberdeen Expenditure Other operating expenses -59306
The University of Aberdeen Expenditure Restructuring costs 0
The University of Aberdeen Expenditure Staff costs -146956
The University of Aberdeen Other Gain/(loss) on disposal of tangible assets 4493
The University of Aberdeen Other Gain/(loss) on investment property 0
The University of Aberdeen Other Gain/(loss) on investments 5780
The University of Aberdeen Other Share of operating surplus/(deficit) in associate(s) 0
The University of Aberdeen Other Share of operating surplus/(deficit) in joint venture(s) -912
The University of Aberdeen Other Taxation 0

Methodology

As stated above, my goal is to calculate the net profit percentage (NPP) of the universities, by dividing any surplus by total income. So as not to over-complicate things in this analysis, if a university has a deficit for the year I’ll say that the NPP is zero for that university. For the purposes of the calculation, I’ll treat the following groups as income:

  1. Normal, operating, income - i.e. that fall under the “Income” category for HESA:

    • “Tuition fees and education contracts”
    • “Funding body grants”
    • “Research grants and contracts”
    • “Other income”
    • “Investment income”
    • “Donations and endowments”
  2. Any additional gains contributing to the surplus/deficit, i.e. within my “Other” category, from:

    • “Gain/(loss) on disposal of tangible assets”
    • “Gain/(loss) on investment property”
    • “Gain/(loss) on investments”
    • “Share of operating surplus/(deficit) in associate(s)”
    • “Share of operating surplus/(deficit) in joint venture(s)”

I’ll call this total “total realised income”, to distinguish it from the term “total income” used by HESA (where it only refers to operating income). As regards my use of the term “realised”, the “surplus/deficit for the year” statistic is a measure of realised income/loss, i.e. of financial gains/losses that have been “locked in” by a university within a particular academic year.2

The importance of this second, “Other”, group for universities can be visualised by calculating total gain/loss across this group as a percentage of total income (across both groups). To do this, I’ll initially create OtherGLProp, including a column with this calculation as a proportion.

Step 1 here is to calculate operating income (currently under “Income”) and additional income/loss (currently under “Other”) for each University:

OtherGLProp <- uniFinData %>%
  filter(Category %in% c("Income", "Other"),
         Subcategory != "Taxation") %>% # drop the "Taxation" rows
  group_by(University, Category) %>%
  summarise(Income000s = sum(Value000s))

# display
OtherGLProp
## # A tibble: 10 × 3
## # Groups:   University [5]
##    University                    Category Income000s
##    <chr>                         <fct>         <dbl>
##  1 The University of Aberdeen    Income       235939
##  2 The University of Aberdeen    Other          9361
##  3 The University of Edinburgh   Income      1187406
##  4 The University of Edinburgh   Other         87677
##  5 The University of Glasgow     Income       813077
##  6 The University of Glasgow     Other         44864
##  7 The University of St. Andrews Income       290378
##  8 The University of St. Andrews Other         19095
##  9 The University of Strathclyde Income       347396
## 10 The University of Strathclyde Other          7159

One thing to note from this is that all universities made a net gain from “Other” activities. Another thing to note, though, is that the data is not in a “tidy” format: the variables “Income” and “Other” are both in the Category column, and the goal is to perform a calculation involving both. The names are also potentially confusing, as “Income” only refers to operating income here. These issues can be addressed by using pivot_wider, and doing some renaming, prior to the calculation, as Step 2:

OtherGLProp <- OtherGLProp %>%
  # pivoting into wider, "tidy", format
  pivot_wider(names_from = "Category", values_from = Income000s) %>%
  # renaming for clarity
  rename(OperatInc = Income, OtherGL = Other) %>%
  # calculating the proportion
  mutate(Prop = OtherGL/(OperatInc+OtherGL))

# displaying the result
OtherGLProp
## # A tibble: 5 × 4
## # Groups:   University [5]
##   University                    OperatInc OtherGL   Prop
##   <chr>                             <dbl>   <dbl>  <dbl>
## 1 The University of Aberdeen       235939    9361 0.0382
## 2 The University of Edinburgh     1187406   87677 0.0688
## 3 The University of Glasgow        813077   44864 0.0523
## 4 The University of St. Andrews    290378   19095 0.0617
## 5 The University of Strathclyde    347396    7159 0.0202

Pivoting was useful here as it enabled the calculation to be conducted as a row-wise operation, one of the virtues of “tidy” data.

Visualising Prop as a horizontal bar chart (and converting the proportions to a percentage, for labelling purposes) helps to demonstrate the results:

OtherGLProp %>%
  ggplot(aes(x = fct_reorder(University, Prop), y = Prop)) + 
    geom_col(width=0.9, colour="black", fill="#39ACAC") +
    labs(title = "Additional Gains as % of Total Realised Income (2020-21)")+
    # add labels, convert to percentage
    geom_text(aes(label=paste(100*round(Prop, 3),"%",sep="")), hjust=-0.15) +
    ylim(0, 0.076) + # prevent cropping of column labels
    theme(axis.text = element_text(size = 12),
          plot.title = element_text(size=14, hjust = 0.7),
          axis.text.x = element_blank(),
          axis.title.y = element_blank(),
          axis.title.x = element_blank(),
          legend.position = "none",
          aspect.ratio = 1.3/2) + 
  coord_flip() 

The percentage of income attributable to “other” sources clearly varies between the universities, but is significant and highest for the University of Edinburgh (6.9%).

(For neatness, I’ll hide the ggplot code blocks from now on as the remaining plots are similar.)

Surplus/deficit for the year

Using the uniFinData data frame created earlier, it’s straightforward to recreate the annual surplus/deficit values for the five universities: this is just the sum of all the values in the Value000s column, after grouping by university. The results can be seen here, in a bar chart:3

It can be seen that University of Edinburgh has the highest surplus, and University of Strathclyde was in deficit for academic year 2020-21.

If, like me, you find adding three zeros onto numbers a bit unintuitive, this can also be represented in millions (by dividing the label values in geom_text by 1000 and rounding):

Net profit percentage for the year

An obvious issue with this analysis is that it’s difficult to compare the surplus/deficits of universities which are vastly different in scale. As noted earlier, University of Edinburgh has a much larger student population than smaller universities like St. Andrews, numbers which are reflected in income from fees.

To get started with calculating the NPP, along the lines described above, I’ll add an extra column to the OtherGLProp data frame from above including what I’ve defined as “total realised income”:

OtherGLProp <- OtherGLProp %>%
  mutate(TotRealisedInc = OperatInc+OtherGL) 
# printing
OtherGLProp
## # A tibble: 5 × 5
## # Groups:   University [5]
##   University                    OperatInc OtherGL   Prop TotRealisedInc
##   <chr>                             <dbl>   <dbl>  <dbl>          <dbl>
## 1 The University of Aberdeen       235939    9361 0.0382         245300
## 2 The University of Edinburgh     1187406   87677 0.0688        1275083
## 3 The University of Glasgow        813077   44864 0.0523         857941
## 4 The University of St. Andrews    290378   19095 0.0617         309473
## 5 The University of Strathclyde    347396    7159 0.0202         354555

The NPP can now be calculated by dividing the surplus for each university by its total realised income. To do this, I’ll join OtherGLProp to the uniFinData from earlier; I’ll also “zero” the NPP for University of Strathclyde, as it was in deficit, and will drop some unnecessary columns:

uniNPP <- uniFinData %>%
  group_by(University) %>%
  summarise(`Surplus/Deficit (000s)` = sum(Value000s)) %>%
  left_join(OtherGLProp, by = "University") %>%
  # calculate NPP 
  mutate(NPP = `Surplus/Deficit (000s)`/TotRealisedInc,
         # if university in deficit, NPP should be zero
         NPP = case_when(NPP <=0 ~ 0, TRUE ~ NPP)) %>%
  # drop unnecessary columns
  select(University, NPP)

# print
uniNPP
## # A tibble: 5 × 2
##   University                       NPP
##   <chr>                          <dbl>
## 1 The University of Aberdeen    0.0666
## 2 The University of Edinburgh   0.168 
## 3 The University of Glasgow     0.189 
## 4 The University of St. Andrews 0.192 
## 5 The University of Strathclyde 0

Visualising this, and also ordering the results:


  1. See the BUFDG’s “Understanding University Finance”, p.41. A deficit, though, clearly can be a bad thing if large: https://www.theguardian.com/education/2022/nov/13/dismay-at-threat-of-devastating-job-cuts-at-birkbeck-university-of-london↩︎

  2. See “Understanding University Finance”, p.41. ↩︎

  3. These values can also be confirmed via the interactive HESA table here: https://www.hesa.ac.uk/data-and-analysis/finances/table-1↩︎