Not sure what this post will end up covering or resolving. But let’s start by looking at data comparisons with the four UK member countries rather than the UK as a whole. May provide for a more meaningful comparison of the data by location.
By Country and Town
Need to group on two columns before generating the sum.
cntry_town = addrs.groupby(["country", "town"])["total_spend"].sum()
print(f"\nTotal spend by Country and All Towns")
cntry_town.info()
Total spend by Country and All Towns
<class 'pandas.core.series.Series'>
MultiIndex: 1339 entries, ('England', 'Abbots Langley') to ('Wales', 'Y Felinheli')
Series name: total_spend
Non-Null Count Dtype
-------------- -----
1339 non-null int64
dtypes: int64(1)
memory usage: 24.9+ KB
England
Okay, let’s start with England.
# Extract all "England" entries and sort
ct_eng_sorted = cntry_town["England"].sort_values(ascending=False)
print(f"\nEngland:")
print(ct_eng_sorted.head())
print(ct_eng_sorted.tail())
England:
town
London 102377000
Birmingham 9264400
Manchester 8948200
Bristol 5843900
Leeds 5172400
Name: total_spend, dtype: int64
town
Betchworth 3600
Lifton 3100
Merriott 3100
Seascale 3000
Badminton 1700
Name: total_spend, dtype: int64
Am thinking some overall statistics might be of use. So let’s get a few.
... ...
# New to define function for 25th Percentile to use with agg()
def p25(x):
return x.quantile(0.25)
... ...
ct_eng_sorted = cntry_town["England"].sort_values(ascending=False)
eng_stat = ct_eng_sorted.agg(["count", "sum", "max", "mean", "median", p25, "min"])
print(f"\nEngland:")
print(eng_stat)
print(ct_eng_sorted.head())
print(ct_eng_sorted.tail())
England:
count 9.310000e+02 -> 931
sum 4.380498e+08 -> 438,049,800
max 1.023770e+08 -> 102,377,000
mean 4.705154e+05 -> 470,515
median 1.634000e+05 -> 163,400
p25 6.660000e+04 -> 66,600
min 1.700000e+03 -> 1,700
Name: total_spend, dtype: float64
town
London 102377000
Birmingham 9264400
Manchester 8948200
Bristol 5843900
Leeds 5172400
Name: total_spend, dtype: int64
town
Betchworth 3600
Lifton 3100
Merriott 3100
Seascale 3000
Badminton 1700
Name: total_spend, dtype: int64
Doing the same for the other three countries produces the following.
Scotland:
count 2.420000e+02 -> 242
sum 2.646100e+07 -> 26,461,000
max 6.514300e+06 -> 6,514,300
mean 1.093430e+05 -> 109,343
median 2.510000e+04 -> 25,100
p25 9.125000e+03 -> 9,125
min 5.000000e+02 -> 500
Name: total_spend, dtype: float64
town
Glasgow 6514300
Edinburgh 4937600
Aberdeen 1322800
Dundee 955100
Lanark 427100
Name: total_spend, dtype: int64
town
Port Glasgow 3300
Munlochy 2800
Isle Of Barra 2800
Lochearnhead 2700
Galston 500
Name: total_spend, dtype: int64
Wales:
count 1.180000e+02 -> 118
sum 1.358220e+07 -> 13,582,220
max 3.990300e+06 -> 3,990,300
mean 1.151034e+05 -> 115,103
median 3.995000e+04 -> 39,950
p25 1.237500e+04 -> 12,375
min 2.100000e+03 -> 2,100
Name: total_spend, dtype: float64
town
Cardiff 3990300
Swansea 1255700
Newport 745200
Wrexham 620400
Bridgend 468900
Name: total_spend, dtype: int64
town
Tyn-Y-Gongl 4400
Moelfre 4100
Dolgellau 4100
Rhayader 3800
Boncath 2100
Name: total_spend, dtype: int64
Northern Ireland:
count 4.300000e+01 -> 43
sum 7.462500e+06 -> 7,462,500
max 2.109800e+06 -> 2,109,800
mean 1.735465e+05 -> 173,547
median 8.920000e+04 -> 89,200
p25 3.040000e+04 -> 30,400
min 4.200000e+03 -> 4,200
Name: total_spend, dtype: float64
town
Belfast 2109800
Newry 605600
Londonderry 534200
Craigavon 364900
Lisburn 295800
Name: total_spend, dtype: int64
town
Augher 10700
Bushmills 9900
Clogher 5300
Aughnacloy 4700
Caledon 4200
Name: total_spend, dtype: int64
What I Am Thinking
I do believe doing the analysis by country makes some sense. Two things stick out for me. The difference in the number of towns in each country (a reflection of total country population I expect). And the difference between the maximum town spend in each country. As well as the mean and median values. I was a little surprised that the mean and median for Northern Ireland were greater than those for Wales when Wales total spend was significantly larger.
I also think that in each country it would likely not be financially beneficial to try and improve sales in any location below the median. Most certainly not for those below the 25th percentile.
Presentation
This really does baffle me. But let’s start looking at few possibilities.
Big Picture Pie Chart
I thought for starters that I would provide a pie chart showing total spend for London, the rest of England, Northern Ireland, Scotland and Wales. Perhaps even for the customers not in the UK for perspective. Let’s give that a shot.
Rather than use Matplotlib myself, I am going to try pandas plot.pie
method. The bulk of the work is creating the data Series for the plot.
And, I did mess around a bit to get additional data on the plot. Knowing a very little about Matplotlib did help.
spend_pie = {"London": ct_eng_sorted["London"]}
spend_pie["Rest of England"] = eng_stat_2["sum"] - ct_eng_sorted["London"]
spend_pie["Scotland"] = sco_stat_2["sum"]
spend_pie["Wales"] = wal_stat_2["sum"]
spend_pie["Northern Ireland"] = ni_stat_2["sum"]
spend_pie["Foreign Customers"] = addrs[addrs['country'] == 'Not-UK']["total_spend"].sum()
print(spend_pie)
pie_data = pd.Series(spend_pie, dtype='UInt32')
print(pie_data)
labels = [f"{loc}\n{spend:,}" for loc, spend in pie_data.items()]
pie_data.plot.pie(legend=False, autopct='%1.1f%%',
figsize=(8, 8), textprops={'size': 'smaller'},
title="Share of Total Spend by Location",
labels=labels)
plt.show()
Now I suppose I could do something similar for each of the countries. But, really don’t see any value in that.
Looking at the above, perhaps business in England is saturated and future efforts to gain business should be expended elsewhere. Or perhaps, the other countries don’t offer much opportunity to find new or increased business and the company should focus on improving sales only in England. With a maybe for some effort in Scotland.
Done
Without any idea about the nature of the business and with no access to management I really don’t see where to go next. So I think I will read the rest of the chapter. Stopping to do some further analysis if I gain more information as I read.
Resources
- pandas.DataFrame.describe
- How to calculate summary statistics
- pandas.Series
- pandas.Series.agg
- pandas.Series.quantile
- pandas.DataFrame.plot
- pandas.Series.plot.pie
- Pie charts
- Pass percentiles to pandas agg() method