Why?

When a beginner first signs up with Kaggle, it is recommended they go through the process of building a model for the Titanic dataset and submit a solution for “grading”. It is, apparently, a common way to get started on Kaggle.

The idea is that after that initial workthrough of the Kaggle process, you would go on to try and improve the success of your classification by changing models, using feature selection, creating new features, etc.

This, of course, involves submitting your predictions against the test dataset to Kaggle for scoring every time you make changes to your model.

As I never got around to trying to improve on my initial result, I plan to go through that whole process in a series of blog posts. Measuring improvement, or the lack thereof, as I go along. Now, I don’t really want to make a submission on Kaggle with each post to get an updated score. So, I need/want to create my own test set with the appropriate targets.

Target Data

I still want to use the Kaggle datasets. That way I can always make a submission if so inclined, with a reasonable idea of what it will score. So, I am going to try and create a CSV file with the targets that match the entries in the Kaggle test dataset. Expect that will be easier said than done. I did a bit of searching and couldn’t find any such target data or a Kaggle test dataset with the targets included.

Data Sources

It’s easy enough to get the Kaggle datasets. On the Titanic - Machine Learning from Disaster competition page, select the Data tab and go from there.

I also found two more datasets that I thought would help me get the job done:

Generate target.csv

Setup Notebook

Ok, let’s set up a Jupyter notebook. Start with the usuals.

In [1]:

from IPython.core.interactiveshell import InteractiveShell
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # for plotting 
import seaborn as sns # for plotting

In [2]:

# set up some notebook display defaults
InteractiveShell.ast_node_interactivity = "all"
%matplotlib inline
plt.style.use('default')
sns.set()
pd.options.display.float_format = '{:,.2f}'.format

Now let’s define some variables with the paths to the dataset files of interest. (This cell will be copied to future notebooks. As are usually the preceding cells.)

In [3]:

# paths to datasets
kaggle_trn = "./data/titanic/train.csv"
kaggle_tst = "./data/titanic/test.csv"
kaggle_trg = "./data/titanic/target.csv"
osf_full = "./data/titanic/osf_titanic.csv"
MYEkMl_full = "./data/titanic/phpMYEkMl.csv"

And, load the three we are currently most interested in.

In [4]:

# load the three datasets of interest
k_tst = pd.read_csv(kaggle_tst)
osf_f = pd.read_csv(osf_full)
ekml_f = pd.read_csv(MYEkMl_full)

And, of course, it always pays to have a look at what we are dealing with. Notice the case differences.

In [5]:

# have a quick look at each of them
k_tst.head()
osf_f.head()
ekml_f.head()

Out[5]:

	PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	892	3	Kelly, Mr. James	male	34.50	0	0	330911	7.83	NaN	Q
1	893	3	Wilkes, Mrs. James (Ellen Needs)	female	47.00	1	0	363272	7.00	NaN	S
2	894	2	Myles, Mr. Thomas Francis	male	62.00	0	0	240276	9.69	NaN	Q
3	895	3	Wirz, Mr. Albert	male	27.00	0	0	315154	8.66	NaN	S
4	896	3	Hirvonen, Mrs. Alexander (Helga E Lindqvist)	female	22.00	1	1	3101298	12.29	NaN	S

Out[5]:

	Name	PClass	Age	Sex	Survived
0	Allen, Miss Elisabeth Walton	1st	29.00	female	1
1	Allison, Miss Helen Loraine	1st	2.00	female	0
2	Allison, Mr Hudson Joshua Creighton	1st	30.00	male	0
3	Allison, Mrs Hudson JC (Bessie Waldo Daniels)	1st	25.00	female	0
4	Allison, Master Hudson Trevor	1st	0.92	male	1

Out[5]:

	pclass	survived	name	sex	age	sibsp	parch	ticket	fare	cabin	embarked	boat	body	home.dest
0	1	1	Allen, Miss. Elisabeth Walton	female	29	0	0	24160	211.3375	B5	S	2	?	St Louis, MO
1	1	1	Allison, Master. Hudson Trevor	male	0.9167	1	2	113781	151.55	C22 C26	S	11	?	Montreal, PQ / Chesterville, ON
2	1	0	Allison, Miss. Helen Loraine	female	2	1	2	113781	151.55	C22 C26	S	?	?	Montreal, PQ / Chesterville, ON
3	1	0	Allison, Mr. Hudson Joshua Creighton	male	30	1	2	113781	151.55	C22 C26	S	?	135	Montreal, PQ / Chesterville, ON
4	1	0	Allison, Mrs. Hudson J C (Bessie Waldo Daniels)	female	25	1	2	113781	151.55	C22 C26	S	?	?	Montreal, PQ / Chesterville, ON

Working with the Data

Decided to do a little testing with the datasets before I got too carried away writing the code to produce the target data file.

To start I took the first row in the Kaggle test dataset. Got the traveller’s name. And, searched the other two datasets to see if I could find that name.

In [6]:

# some testing
print(f"Test 1:\n______\n")
tst_nm = k_tst.loc[0, "Name"]
osf_srvv = osf_f[osf_f["Name"] == tst_nm]
print(f"osf matching entries ({tst_nm}):\n{osf_srvv}")
ekml_srvv = ekml_f[ekml_f["name"] == tst_nm]
print(f"\nekml matching entries ({tst_nm}): {ekml_srvv}")

Test 1:
______
osf matching entries (Kelly, Mr. James):
Empty DataFrame
Columns: [Name, PClass, Age, Sex, Survived]
Index: []
ekml matching entries (Kelly, Mr. James):      pclass  survived              name   sex   age  sibsp  parch  ticket  

924       3         0  Kelly, Mr. James  male  34.5      0      0  330911

925       3         0  Kelly, Mr. James  male    44      0      0  363592
   fare cabin embarked boat body home.dest  
924 7.8292 ? Q ? 70 ?
925 8.05 ? S ? ? ?

That didn’t go so well. I didn’t notice earlier, but the OSF dataset dosen’t have the periods after salutations and the like. So, let’s create a 2nd name variable for that dataset. And, try again.

In [7]:

print(f"Test 1 (cont):\n______\n")
tst_nm = k_tst.loc[0, "Name"]
tst_nm_2 = tst_nm.replace(".", "")
osf_srvv = osf_f[osf_f["Name"] == tst_nm_2]
print(f"osf matching entries ({tst_nm_2}):\n{osf_srvv}")
ekml_srvv = ekml_f[ekml_f["name"] == tst_nm]
print(f"\nekml matching entries ({tst_nm}): {ekml_srvv}")

Test 1 (cont): ______

osf matching entries (Kelly, Mr James): Name PClass Age Sex Survived 921 Kelly, Mr James 3rd 44.00 male 0 922 Kelly, Mr James 3rd 42.00 male 0

ekml matching entries (Kelly, Mr. James): pclass survived name sex age sibsp parch ticket
924 3 0 Kelly, Mr. James male 34.5 0 0 330911
925 3 0 Kelly, Mr. James male 44 0 0 363592

   fare cabin embarked boat body home.dest

924 7.8292 ? Q ? 70 ?
925 8.05 ? S ? ? ?

And, something else I wasn’t expecting two entries for each name found. And, a number of discrepancies in the data. But at least we found the name in both of the datasets. As for the multiple entries, I decided to just take the mean of the survived values. If all zeroes the mean will be zero, and if all ones, the mean will be one. And, if both datasets agree, we are good to go. Let’s try that.

In [8]:

# some testing
print(f"Test 1 (cont):\n______\n")
tst_nm = k_tst.loc[0, "Name"]
tst_nm_2 = tst_nm.replace(".", "")
# print(f"tst_nm: {tst_nm}, tst_nm_2: {tst_nm_2}")
osf_srvv = osf_f[osf_f["Name"] == tst_nm_2]
print(f"osf matching entries ({tst_nm_2}):\n{osf_srvv}")
osf_srvv = osf_f[osf_f["Name"] == tst_nm_2].Survived
print(f"osf survived values for matching names\n{osf_srvv}")
osf_srvv = osf_f[osf_f["Name"] == tst_nm_2].Survived.mean()
print(f"osf_srvv mean: {osf_srvv}")
ekml_srvv = ekml_f[ekml_f["name"] == tst_nm].survived.mean()
print(f"ekml_srvv mean ({tst_nm}): {osf_srvv}")

Test 1 (cont): ______

osf matching entries (Kelly, Mr James): Name PClass Age Sex Survived 921 Kelly, Mr James 3rd 44.00 male 0 922 Kelly, Mr James 3rd 42.00 male 0 osf survived values for matching names 921 0 922 0 Name: Survived, dtype: int64 osf_srvv mean: 0.0 ekml_srvv mean (Kelly, Mr. James): 0.0

That seems to work. Okay let’s try the next name in test.csv.

In [9]:

print(f"Test 2:\n______\n")
tst_nm = k_tst.loc[1, "Name"]
tst_nm_2 = tst_nm.replace(".", "")
# print(f"tst_nm: {tst_nm}, tst_nm_2: {tst_nm_2}")
osf_srvv = osf_f[osf_f["Name"] == tst_nm_2]
#print(f"osf matching entries:\n{osf_srvv}")
osf_srvv = osf_f[osf_f["Name"] == tst_nm_2].Survived
print(f"osf survived values for matching names ({tst_nm_2}):\n{osf_srvv}")
ekml_srvv = ekml_f[ekml_f["name"] == tst_nm]
# print(f"ekml matching entries:\n{osf_srvv}")
ekml_srvv = ekml_f[ekml_f["name"] == tst_nm].survived
print(f"ekml survived values for matching names ({tst_nm}):\n{osf_srvv}")

Test 2:
______
osf survived values for matching names (Wilkes, Mrs James (Ellen Needs)):
Series([], Name: Survived, dtype: int64)
ekml survived values for matching names (Wilkes, Mrs. James (Ellen Needs)):
Series([], Name: Survived, dtype: int64)

/div>

Neither name found. I did some looking, the person is in all three datasets. There are just some naming variations. I am too lazy (and not enough time for this post) to try and sort all the possible variations (there were a number of others, as you will see). Maybe at some later date.

I will simply identify these cases in the output file and deal with them manually.

There is also the possibility that the passenger would be found in only one of the two non-Kaggle datasets. Will also have to deal with that. Likely just use the one value we get.

Okay onto the next traveller.

In [10]:

print(f"Test 3:\n______\n")
tst_nm = k_tst.loc[2, "Name"]
tst_nm_2 = tst_nm.replace(".", "")
# print(f"tst_nm: {tst_nm}, tst_nm_2: {tst_nm_2}")
osf_srvv = osf_f[osf_f["Name"] == tst_nm_2].Survived
print(f"osf survival value ({tst_nm_2}):\n{osf_srvv}")
ekml_srvv = ekml_f[ekml_f["name"] == tst_nm].survived
print(f"\nekml survival value ({tst_nm}):\n{ekml_srvv}")

Test 3: ______

osf survival value (Myles, Mr Thomas Francis): 509 0 Name: Survived, dtype: int64

ekml survival value (Myles, Mr. Thomas Francis): 511 0 Name: survived, dtype: int64

Okay, when there is only one matching entry found, we are getting a series, not the actual survival value. Series.item() should fix that for us.

In [11]:

print(f"Test 3 (cont):\n______\n")
tst_nm = k_tst.loc[2, "Name"]
tst_nm_2 = tst_nm.replace(".", "")
# print(f"tst_nm: {tst_nm}, tst_nm_2: {tst_nm_2}")
osf_srvv = osf_f[osf_f["Name"] == tst_nm_2].Survived.item()
print(f"osf survival value ({tst_nm_2}): {osf_srvv}")
ekml_srvv = ekml_f[ekml_f["name"] == tst_nm].survived.item()
print(f"\nekml survival value ({tst_nm}): {ekml_srvv}")

Test 3 (cont):
______
osf survival value (Myles, Mr Thomas Francis): 0
ekml survival value (Myles, Mr. Thomas Francis): 0

Okay, that seems to work. But, dug around to find another potential case I will need to deal with. Let’s have a look.

In [12]:

print(f"Test 4:\n______\n")
tst_nm = k_tst.loc[39, "Name"]
tst_nm_2 = tst_nm.replace(".", "")
# print(f"tst_nm: {tst_nm}, tst_nm_2: {tst_nm_2}")
osf_srvv = osf_f[osf_f["Name"] == tst_nm_2].Survived.item()
print(f"osf survival value as number ({tst_nm_2}): {osf_srvv}")
ekml_srvv = ekml_f[ekml_f["name"] == tst_nm].survived.item()
print(f"\nekml survival value as number ({tst_nm}): {ekml_srvv}")

Test 4:
______
osf survival value as number (Hee, Mr Ling): 0
ekml survival value as number (Hee, Mr. Ling): 1

Guess I should have expected mismatches in the Survived values in the datasets, if there were mismatches of other data items. I am also not going to try to resolve these in my code — expect that would be most difficult. Will also do so manually by checking Encyclopedia Titanica.

Plan

Based on the above, the plan of attack is to go throught the Kaggle test dataset line by line:

locate the individual in the other two datasets
get the survival value from both files for that individual
- if name not found in either dataset, write name to target.csv
- if found in both and survival values same, write to target.csv
- if one missing but other found, write the one result to file (this may be a risk)
- if survival values different, write name and note to file

The Code

Sorry, it is very sloppy. But, I am sure you can tidy it up if so desired.

In [13]:

# okay not going to go throuh all steps I went through to get a semblance of succes
# create target.csv matching entries in kaggle test.csv
i = 0 # for testing only
with open(kaggle_trg, 'w') as trg_fh:
  for _, rw in k_tst.iterrows():
    # i += 1
    osf_fnd = True
    ekml_fnd = True
    tst_nm = rw.Name
    tst_nm = tst_nm.replace('"', '')
    tst_nm_2 = tst_nm.replace(".", "")
    # print(f"tst_nm: {tst_nm}, tst_nm_2: {tst_nm_2}")
    osf_srvv = osf_f[osf_f["Name"] == tst_nm_2].Survived
    # if i == 2:
    #   print(f"osf_srvv series ({tst_nm_2}):")
    #   display(osf_srvv)
    # print(f"osf_srvv raw ({tst_nm_2}): {osf_srvv}")
    if len(osf_srvv) == 0:
      osf_fnd = False
      print(f"osf: {tst_nm_2} not found!")
    elif len(osf_srvv) == 1:
      osf_srvv = osf_srvv.item()
      # if i == 2:
      #   print(f"\tosf_srvv: {osf_srvv}")
    else:
      osf_srvv = osf_srvv.mean()
    ekml_srvv = ekml_f[ekml_f["name"] == tst_nm].survived
    # if i == 2:
    #   print(f"ekml_srvv series ({tst_nm}):")
    #   display(ekml_srvv)
    if len(ekml_srvv) == 0:
      ekml_fnd = False
      print(f"ekml: {tst_nm} not found!")
    elif len(ekml_srvv) == 1:
      ekml_srvv = ekml_srvv.item()
      # if i == 2:
      #   print(f"\tekml_srvv: {ekml_srvv}")
    else:
      ekml_srvv = ekml_srvv.mean()
    # print(f"osf_srvv ({tst_nm_2}): {osf_srvv}, ekml_srvv ({tst_nm}): {ekml_srvv}")
    if osf_fnd and ekml_fnd and osf_srvv == ekml_srvv:
      t_out = trg_fh.write(f"{int(ekml_srvv)}\n")
    elif osf_fnd and ekml_fnd and osf_srvv != ekml_srvv:
      t_out = trg_fh.write(f"{tst_nm} -> {osf_srvv} != {ekml_srvv}\n")
    elif osf_fnd and not ekml_fnd:
      t_out = trg_fh.write(f"{int(osf_srvv)}\n")
    elif ekml_fnd and not osf_fnd:
      t_out = trg_fh.write(f"{int(ekml_srvv)}\n")
    else:
      t_out = trg_fh.write(f"{tst_nm}\n")
    # if i == 5:
    #   break

osf: Wilkes, Mrs James (Ellen Needs) not found!
osf: Hirvonen, Mrs Alexander (Helga E Lindqvist) not found!
osf: Abrahim, Mrs Joseph (Sophie Halaut Easu) not found!
osf: Davies, Mr John Samuel not found!
osf: del Carlo, Mrs Sebastiano (Argenia Genovesi) not found!
osf: Assaf Khalil, Mrs Mariana (Miriam) not found!
ekml: Assaf Khalil, Mrs. Mariana (Miriam) not found!
osf: Olsen, Master Artur Karl not found!
osf: Ostby, Miss Helene Ragnhild not found!
osf: Daher, Mr Shedid not found!
osf: Jefferys, Mr Clifford Thomas not found!
osf: Dean, Mrs Bertram (Eva Georgetta Light) not found!
osf: Johnston, Mrs Andrew G (Elizabeth Lily Watson) not found!
ekml: Johnston, Mrs. Andrew G (Elizabeth Lily Watson) not found!
osf: Mock, Mr Philipp Edmund not found!
osf: Katavelas, Mr Vassilios (Catavelas Vassilios) not found!
ekml: Katavelas, Mr. Vassilios (Catavelas Vassilios) not found!
osf: Roth, Miss Sarah A not found!
osf: Cacic, Miss Manda not found!
osf: Sap, Mr Julius not found!
osf: Goldsmith, Mr Nathan not found!
osf: Peltomaki, Mr Nikolai Johannes not found!
osf: Chevre, Mr Paul Romaine not found!
osf: Shaughnessy, Mr Patrick not found!
osf: Coutts, Mrs William (Winnie Minnie Treanor) not found!
ekml: Coutts, Mrs. William (Winnie Minnie Treanor) not found!
osf: Pulbaum, Mr Franz not found!
osf: Hocking, Miss Ellen Nellie not found!
ekml: Hocking, Miss. Ellen Nellie not found!
osf: Abelseth, Mr Olaus Jorgensen not found!
osf: Chaudanson, Miss Victorine not found!
osf: Dika, Mr Mirko not found!
osf: Bjorklund, Mr Ernst Herbert not found!
osf: Tucker, Mr Gilbert Milligan Jr not found!
osf: Nieminen, Miss Manta Josefina not found!
osf: Geiger, Miss Amalie not found!
osf: Cornell, Mrs Robert Clifford (Malvina Helen Lamson) not found!
osf: Demetri, Mr Marinko not found!
osf: Lamb, Mr John Joseph not found!
osf: O'Donoghue, Ms Bridget not found!
osf: Dyker, Mrs Adolf Fredrik (Anna Elisabeth Judith Andersson) not found!
osf: Tenglin, Mr Gunnar Isidor not found!
osf: Cavendish, Mrs Tyrell William (Julia Florence Siegel) not found!
osf: Nancarrow, Mr William Henry not found!
osf: Johansson Palmquist, Mr Oskar Leander not found!
osf: Thomas, Mrs Alexander (Thamine Thelma) not found!
ekml: Thomas, Mrs. Alexander (Thamine Thelma) not found!
osf: Ryan, Mr Edward not found!
osf: Willer, Mr Aaron (Abi Weller) not found!
ekml: Willer, Mr. Aaron (Abi Weller) not found!
osf: Shine, Miss Ellen Natalia not found!
osf: Straus, Mrs Isidor (Rosalie Ida Blun) not found!
osf: Thomas, Mr John not found!
osf: Chapman, Mrs John Henry (Sara Elizabeth Lawry) not found!
osf: Watt, Miss Bertha J not found!
osf: Kiernan, Mr John not found!
osf: Brobeck, Mr Karl Rudolf not found!
osf: McCoy, Miss Alicia not found!
osf: Spinner, Mr Henry John not found!
osf: Gracie, Col Archibald IV not found!
osf: Lefebre, Mrs Frank (Frances) not found!
osf: Thomas, Mr Charles P not found!
osf: Zakarian, Mr Mapriededer not found!
osf: Schmidt, Mr August not found!
osf: Goodwin, Mr Charles Frederick not found!
osf: Goodwin, Miss Jessie Allis not found!
osf: Daniels, Miss Sarah not found!
osf: Lindeberg-Lind, Mr Erik Gustaf (Mr Edward Lingrey) not found!
ekml: Lindeberg-Lind, Mr. Erik Gustaf (Mr Edward Lingrey) not found!
osf: Vander Planke, Mr Julius not found!
osf: Klasen, Mrs (Hulda Kristina Eugenia Lofqvist) not found!
osf: Bird, Miss Ellen not found!
osf: Peacock, Mrs Benjamin (Edith Nile) not found!
osf: Touma, Master Georges Youssef not found!
osf: Peruschitz, Rev Joseph Maria not found!
osf: Kink-Heilmann, Mrs Anton (Luise Heilmann) not found!
osf: Cassebeer, Mrs Henry Arthur Jr (Eleanor Genevieve Fosdick) not found!
osf: Hellstrom, Miss Hilda Maria not found!
osf: Zakarian, Mr Ortin not found!
osf: Brown, Miss Edith Eileen not found!
osf: Compton, Mr Alexander Taylor Jr not found!
osf: Douglas, Mrs Frederick Charles (Mary Helene Baxter) not found!
osf: Maybery, Mr Frank Hubert not found!
osf: Phillips, Miss Alice Frances Louisa not found!
osf: Veal, Mr James not found!
osf: van Billiard, Master Walter John not found!
osf: Lingane, Mr John not found!
osf: Baimbrigge, Mr Charles Robert not found!
osf: Rasmussen, Mrs (Lena Jacobsen Solvang) not found!
osf: Astor, Col John Jacob not found!
osf: Andrew, Mr Frank Thomas not found!
osf: Omont, Mr Alfred Fernand not found!
osf: Rosenbaum, Miss Edith Louise not found!
osf: Delalic, Mr Redjo not found!
osf: Deacon, Mr Percy William not found!
osf: Howard, Mrs Benjamin (Ellen Truelove Arman) not found!
osf: Mahon, Miss Bridget Delia not found!
osf: Thomson, Mr Alexander Morrison not found!
osf: Duran y More, Miss Florentina not found!
osf: Reynolds, Mr Harold J not found!
osf: Cook, Mrs (Selena Rogers) not found!
osf: Karlsson, Mr Einar Gervasius not found!
osf: Moubarek, Mrs George (Omine Amenia Alexander) not found!
ekml: Moubarek, Mrs. George (Omine Amenia Alexander) not found!
osf: Asplund, Mr Johan Charles not found!
osf: McNeill, Miss Bridget not found!
osf: Everett, Mr Thomas James not found!
osf: Hocking, Mr Samuel James Metcalfe not found!
osf: Sweet, Mr George Frederick not found!
osf: Wiklund, Mr Karl Johan not found!
osf: Vendel, Mr Olof Edvin not found!
osf: Baccos, Mr Raffull not found!
osf: Douglas, Mrs Walter Donald (Mahala Dutton) not found!
osf: Christy, Mrs (Alice Frances) not found!
osf: Spedden, Mr Frederic Oakley not found!
osf: Johnston, Master William Arthur Willie not found!
ekml: Johnston, Master. William Arthur Willie not found!
osf: Hold, Mrs Stephen (Annie Margaret Hill) not found!
osf: Khalil, Mrs Betros (Zahie Maria Elias) not found!
ekml: Khalil, Mrs. Betros (Zahie Maria Elias) not found!
osf: Abrahamsson, Mr Abraham August Johannes not found!
osf: Mahon, Mr John not found!
osf: de Messemaeker, Mr Guillaume Joseph not found!
osf: Nilsson, Mr August Ferdinand not found!
osf: Wells, Mrs Arthur Henry (Addie Dart Trevaskis) not found!
ekml: Wells, Mrs. Arthur Henry (Addie Dart Trevaskis) not found!
osf: Portaluppi, Mr Emilio Ilario Giuseppe not found!
osf: Chisholm, Mr Roderick Robert Crispin not found!
osf: Howard, Miss May Elizabeth not found!
osf: Pokrnic, Mr Mate not found!
osf: Lennon, Miss Mary not found!
osf: Saade, Mr Jean Nassr not found!
osf: Bryhl, Miss Dagmar Jenny Ingeborg  not found!
ekml: Bryhl, Miss. Dagmar Jenny Ingeborg  not found!
osf: Parker, Mr Clifford Richard not found!
osf: Oreskovic, Miss Jelka not found!
osf: Fleming, Miss Honora not found!
osf: Touma, Miss Maria Youssef not found!
osf: Rosblom, Miss Salli Helena not found!
osf: Franklin, Mr Charles (Charles Fardon) not found!
osf: Rheims, Mr George Alexander Lucien not found!
osf: Daly, Miss Margaret Marcella Maggie not found!
ekml: Daly, Miss. Margaret Marcella Maggie not found!
osf: Nasr, Mr Mustafa not found!
osf: Wittevrongel, Mr Camille not found!
osf: Laroche, Miss Louise not found!
osf: Samaan, Mr Hanna not found!
osf: Olsson, Mr Oscar Wilhelm not found!
osf: Phillips, Mr Escott Robert not found!
osf: Pokrnic, Mr Tome not found!
osf: McCarthy, Miss Catherine Katie not found!
ekml: McCarthy, Miss. Catherine Katie not found!
osf: Aks, Master Philip Frank not found!
osf: Hansen, Mrs Claus Peter (Jennie L Howard) not found!
osf: Cacic, Mr Jego Grga not found!
osf: Vartanian, Mr David not found!
osf: White, Mrs John Stuart (Ella Holmes) not found!
osf: Rogers, Mr Reginald Harry not found!
osf: Jonsson, Mr Nils Hilding not found!
osf: Jefferys, Mr Ernest Wilfred not found!
osf: Kreuchen, Miss Emilie not found!
osf: Rosenshine, Mr George (Mr George Thorne) not found!
ekml: Rosenshine, Mr. George (Mr George Thorne) not found!
osf: Clarke, Mr Charles Valentine not found!
osf: Enander, Mr Ingvar not found!
osf: Davies, Mrs John Morgan (Elizabeth Agnes Mary White)  not found!
ekml: Davies, Mrs. John Morgan (Elizabeth Agnes Mary White)  not found!
osf: Thomas, Mr Tannous not found!
osf: Nakid, Mrs Said (Waika Mary Mowad) not found!
ekml: Nakid, Mrs. Said (Waika Mary Mowad) not found!
osf: Betros, Master Seman not found!
osf: Fillbrook, Mr Joseph Charles not found!
osf: Sage, Mr John George not found!
osf: van Billiard, Master James William not found!
osf: Abelseth, Miss Karen Marie not found!
osf: Whabee, Mrs George Joseph (Shawneene Abi-Saab) not found!
osf: Dean, Miss Elizabeth Gladys Millvina not found!
ekml: Dean, Miss. Elizabeth Gladys Millvina not found!
osf: Lindell, Mrs Edvard Bengtsson (Elin Gerda Persson) not found!
osf: Sage, Master William Henry not found!
osf: Mallet, Mrs Albert (Antoinette Magnin) not found!
osf: Harder, Mrs George Achilles (Dorothy Annan) not found!
osf: Sage, Mrs John (Annie Bullen) not found!
osf: Caram, Mr Joseph not found!
osf: Riihivouri, Miss Susanna Juhantytar Sanni not found!
ekml: Riihivouri, Miss. Susanna Juhantytar Sanni not found!
osf: Gibson, Mrs Leonard (Pauline C Boeson) not found!
osf: Wilson, Miss Helen Alice not found!
osf: Cotterill, Mr Henry Harry not found!
ekml: Cotterill, Mr. Henry Harry not found!
osf: Risien, Mrs Samuel (Emma) not found!
osf: McNamee, Mrs Neal (Eileen O'Leary) not found!
osf: Wheeler, Mr Edwin Frederick not found!
ekml: Wheeler, Mr. Edwin Frederick not found!
osf: Canavan, Mr Patrick not found!
osf: Palsson, Master Paul Folke not found!
osf: Kink-Heilmann, Mr Anton not found!
osf: Smith, Mrs Lucien Philip (Mary Eloise Hughes) not found!
osf: Larsson-Rondberg, Mr Edvard A not found!
osf: Conlon, Mr Thomas Henry not found!
osf: Gibson, Miss Dorothy Winifred not found!
osf: Nourney, Mr Alfred (Baron von Drachstedt) not found!
ekml: Nourney, Mr. Alfred (Baron von Drachstedt) not found!
osf: Ware, Mr William Jeffery not found!
osf: Riordan, Miss Johanna Hannah not found!
ekml: Riordan, Miss. Johanna Hannah not found!
osf: Naughton, Miss Hannah not found!
osf: Minahan, Mrs William Edward (Lillian E Thorpe) not found!
osf: Henriksson, Miss Jenny Lovisa not found!
osf: Oliva y Ocana, Dona Fermina not found!
osf: Saether, Mr Simon Sivertsen not found!
osf: Peter, Master Michael J not found!

Done, m’thinks

I had, before my testing, hoped to have a fully functional target.csv once things were coded and working. No such luck. And, given the list above, it is clearly going to take me awhile to manually resolve the errors in target.csv. So, sorry I can’t make the file available with this post. But will do my best to make sure it is available to you when needed.

Feel free to download and play with my version of this post’s related notebook.

This was a bit of a rush. I am still deeply embroiled in that course I am working on. And still over a month to go. That said, I encountered something working on one of the projects that sort of left me flabbergasted. So, the next post will take a step sideways and look at what I encountered and the lesson I learned. Until then…

Resouces

Titanic.csv (Version: 1), uploaded 2018-02-08 09:17 AM by JASP
Titanic.csv, Uploaded 16-10-2017 by Joaquin Vanschoren
Titanic: phpMYEkMl.csv
Encyclopedia Titanica
pandas.Series.item

Too Old To Code

Titanic Dataset: Introduction

Why?

Target Data

Data Sources

Generate target.csv

Setup Notebook

Working with the Data

Plan

The Code

Done, m’thinks

Resouces