Titanic: What can Einstein Discovery tell us about the disaster?

Ever wanted to learn more about the Titanic disaster? And this time not from movies but from data? Why not use Einstein Discovery for this so we have some time left to actually watch the movie afterwards.

Let’s use this basic Machine Learning exercise to understand better the power and easy of use for Einstein Discovery and comparability to other ML tools. Our basic question is “what sort of people were more likely to survive the Titanic sinking”? You might have an idea already about the outcome, but let’s pretend we don’t know anything about this historic event and we haven’t watched the movie (yet).

First thing I uploaded the Titanic training dataset from the Kaggle Titanic Competition which you can find here (https://www.kaggle.com/c/titanic/data?select=train.csv) please download the “train.csv”. For this simply create a new dataset in Einstein Analytics based on CSV File.

Give your dataset a descriptive name and assign it to one of your apps.

In the next step you want to “Edit Field Attributes” as Einstein Analytics tries to ingest the data on it’s best guess without knowing the business case. You will see that a few field names have the field type “Measure” but we know already that some of those fields are actually dimensions. In this step we can now change those fields to “Dimension” (Passenger ID,Survived,Class). Now you can upload the file  file.

After Einstein Analytics finished the upload we can simply create a story from the Dataset itself, just click on Create Story.

With this Einstein Discovery is opening the Story wizard which makes Model creation definition possible in 3 easy steps.

Select field Survival(1)
Select Insights & Predictions
Select Automated

After you hit “Create Story” Einstein Discovery is creating the model which takes a couple of minutes.

When finished Discovery presents already the narrative story which it created based on the data. Let’s see what we can find out.

On the very first chart we can already see that “Sex” of the passenger was most significant indicator for their survival. Not surprising is here the high rate for categorical variable “female” and the low for “males”.

On the very first chart we can already see that “Sex” of the passenger was most significant indicator for their survival. Not surprising is here the high rate for categorical variable “female” and the low for “males”.


If we scroll down we see several other charts (orders descending by statistics power) breaking down the finding in pari combination of “sex and age”,” Passenger class and Sex” and so on.

When we scroll further down we see “Fare” is impacting survival. This is not surprising eigher   but this kind of sounds like a duplicate as we already have the “Passenger call Class” in our model.

Let’s clean our model up a bit. Of course Einstein Discovery is helping us with this as well. Under “Model” we can validate the quality of our model. When we look at the model overview Einstein Discovery already warns us that soemthing is not correct yet. Discovery highlights Duplicates as error for us. I can now simply click on the button “Review Updates” to select my choices based on the Einstein Discovery recommendation.

I’m retaining the Class and hit create story again. When I check the Model again I can see that I have now all lights on “green” which simplified means we don’t have any obvious errors in our model anymore.

Now we can dig deeper into the story “Why It Happened” and validate what Einstein Discovery can tell us versus what we have learned since 1912 . 

As you can see Einstein Discovery enabled us within a few minutes to learn more of which factors influences the survival of different passengers. We might say, “I knew all of this” which is totally correct. But let’s pretend you didn’t know anything about the disaster, no deep sea expeditions,  no passenger stories, no movies……. you would just have data. 

Yep….and this is exaclty the power of Einstein Discovery. You just need data and within a short time it will surface patterns and insights for you in a narrative easy to digest form. 

And the great thing is we can use it for any data, regardless if it is coming from Core Salesforce or any external datasource!!

What to do now with all the time left thanks to Einstein Discovery?

Additional resources:

 
 

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.