Not sure what this post will end up covering or resolving. But let’s start by looking at data comparisons with the four UK member countries rather than the UK as a whole. May provide for a more meaningful comparison of the data by location.

By Country and Town

Need to group on two columns before generating the sum.

cntry_town = addrs.groupby(["country", "town"])["total_spend"].sum()
print(f"\nTotal spend by Country and All Towns")
cntry_town.info()

Total spend by Country and All Towns

<class 'pandas.core.series.Series'>
MultiIndex: 1339 entries, ('England', 'Abbots Langley') to ('Wales', 'Y Felinheli')
Series name: total_spend
Non-Null Count  Dtype
--------------  -----
1339 non-null   int64
dtypes: int64(1)
memory usage: 24.9+ KB

England

Okay, let’s start with England.

# Extract all "England" entries and sort
ct_eng_sorted = cntry_town["England"].sort_values(ascending=False)
print(f"\nEngland:")
print(ct_eng_sorted.head())
print(ct_eng_sorted.tail())

England:

town
London        102377000
Birmingham      9264400
Manchester      8948200
Bristol         5843900
Leeds           5172400
Name: total_spend, dtype: int64

town
Betchworth    3600
Lifton        3100
Merriott      3100
Seascale      3000
Badminton     1700
Name: total_spend, dtype: int64

Am thinking some overall statistics might be of use. So let’s get a few.

... ...

# New to define function for 25th Percentile to use with agg()
def p25(x):
  return x.quantile(0.25)

... ...

ct_eng_sorted = cntry_town["England"].sort_values(ascending=False)
eng_stat = ct_eng_sorted.agg(["count", "sum", "max", "mean", "median", p25, "min"])
print(f"\nEngland:")
print(eng_stat)
print(ct_eng_sorted.head())
print(ct_eng_sorted.tail())

England:

count     9.310000e+02  ->         931
sum       4.380498e+08  -> 438,049,800
max       1.023770e+08  -> 102,377,000
mean      4.705154e+05  ->     470,515
median    1.634000e+05  ->     163,400
p25       6.660000e+04  ->      66,600
min       1.700000e+03  ->       1,700
Name: total_spend, dtype: float64

town
London        102377000
Birmingham      9264400
Manchester      8948200
Bristol         5843900
Leeds           5172400
Name: total_spend, dtype: int64

town
Betchworth    3600
Lifton        3100
Merriott      3100
Seascale      3000
Badminton     1700
Name: total_spend, dtype: int64

Doing the same for the other three countries produces the following.

Scotland:

count     2.420000e+02  ->       242
sum       2.646100e+07  -> 26,461,000
max       6.514300e+06  ->  6,514,300
mean      1.093430e+05  ->    109,343
median    2.510000e+04  ->     25,100
p25       9.125000e+03  ->      9,125
min       5.000000e+02  ->        500
Name: total_spend, dtype: float64

town
Glasgow      6514300
Edinburgh    4937600
Aberdeen     1322800
Dundee        955100
Lanark        427100
Name: total_spend, dtype: int64

town
Port Glasgow     3300
Munlochy         2800
Isle Of Barra    2800
Lochearnhead     2700
Galston           500
Name: total_spend, dtype: int64

Wales:

count     1.180000e+02  ->        118
sum       1.358220e+07  -> 13,582,220
max       3.990300e+06  ->  3,990,300
mean      1.151034e+05  ->    115,103
median    3.995000e+04  ->     39,950
p25       1.237500e+04  ->     12,375
min       2.100000e+03  ->      2,100
Name: total_spend, dtype: float64

town
Cardiff     3990300
Swansea     1255700
Newport      745200
Wrexham      620400
Bridgend     468900
Name: total_spend, dtype: int64

town
Tyn-Y-Gongl    4400
Moelfre        4100
Dolgellau      4100
Rhayader       3800
Boncath        2100
Name: total_spend, dtype: int64

Northern Ireland:

count     4.300000e+01  ->        43
sum       7.462500e+06  -> 7,462,500
max       2.109800e+06  -> 2,109,800
mean      1.735465e+05  ->   173,547
median    8.920000e+04  ->    89,200
p25       3.040000e+04  ->    30,400
min       4.200000e+03  ->     4,200
Name: total_spend, dtype: float64

town
Belfast        2109800
Newry           605600
Londonderry     534200
Craigavon       364900
Lisburn         295800
Name: total_spend, dtype: int64

town
Augher        10700
Bushmills      9900
Clogher        5300
Aughnacloy     4700
Caledon        4200
Name: total_spend, dtype: int64

What I Am Thinking

I do believe doing the analysis by country makes some sense. Two things stick out for me. The difference in the number of towns in each country (a reflection of total country population I expect). And the difference between the maximum town spend in each country. As well as the mean and median values. I was a little surprised that the mean and median for Northern Ireland were greater than those for Wales when Wales total spend was significantly larger.

I also think that in each country it would likely not be financially beneficial to try and improve sales in any location below the median. Most certainly not for those below the 25th percentile.

Presentation

This really does baffle me. But let’s start looking at few possibilities.

Big Picture Pie Chart

I thought for starters that I would provide a pie chart showing total spend for London, the rest of England, Northern Ireland, Scotland and Wales. Perhaps even for the customers not in the UK for perspective. Let’s give that a shot.

Rather than use Matplotlib myself, I am going to try pandas plot.pie method. The bulk of the work is creating the data Series for the plot.

And, I did mess around a bit to get additional data on the plot. Knowing a very little about Matplotlib did help.

spend_pie = {"London": ct_eng_sorted["London"]}
spend_pie["Rest of England"] = eng_stat_2["sum"] - ct_eng_sorted["London"]
spend_pie["Scotland"] = sco_stat_2["sum"]
spend_pie["Wales"] = wal_stat_2["sum"]
spend_pie["Northern Ireland"] = ni_stat_2["sum"]
spend_pie["Foreign Customers"] = addrs[addrs['country'] == 'Not-UK']["total_spend"].sum()
print(spend_pie)

pie_data = pd.Series(spend_pie, dtype='UInt32')
print(pie_data)

labels = [f"{loc}\n{spend:,}" for loc, spend in pie_data.items()]

pie_data.plot.pie(legend=False, autopct='%1.1f%%',
                  figsize=(8, 8), textprops={'size': 'smaller'},
                  title="Share of Total Spend by Location",
                  labels=labels)
plt.show()

pie chart showing total spend by UK country, London and foreign customers

Now I suppose I could do something similar for each of the countries. But, really don’t see any value in that.

Looking at the above, perhaps business in England is saturated and future efforts to gain business should be expended elsewhere. Or perhaps, the other countries don’t offer much opportunity to find new or increased business and the company should focus on improving sales only in England. With a maybe for some effort in Scotland.

Done

Without any idea about the nature of the business and with no access to management I really don’t see where to go next. So I think I will read the rest of the chapter. Stopping to do some further analysis if I gain more information as I read.

Resources

pandas.DataFrame.describe
How to calculate summary statistics
pandas.Series
pandas.Series.agg
pandas.Series.quantile
pandas.DataFrame.plot
pandas.Series.plot.pie
Pie charts
Pass percentiles to pandas agg() method

Too Old To Code

Data Analysis: Identifying Customer Geographies, Part IV