This is how you should use Data Science to plan your trip

Crdealme
4 min readJan 9, 2021
Photo by [link/source]

Before the Covid-19 travelling was a simple task: you pick a holiday and a nice city to spend your time. Nowadays, a lot of caution is needed not to spread the virus (stay safe!), but the day will come when you will be able to explore new horizons and places.

This post brings three basic questions we can pose to a public dataset in order to better understand and plan a trip. If you are a data scientist/nerd like me, the odds are that using Exploratory Data Analysis and Statistics to plan a trip are part of the fun of travelling.

Before we start…

The data used in the following analysis comes from a public AirBnB dataset, available in Kaggle platform. The ideia is that you can use similar data to pose questions and plan your trip to other places.

Let’s dive in!

What are the busiest times of the year to visit Seattle?

Almost everyone wants to avoid crowding, especially during the pandemic times. Seasonality analysis in time series helps us understand when the city we want as a destination is full of tourists.

The diagram below presents the unavailable listings in 2016. It is clear that January (winter break) is a pretty busy month, but unavailability decreases from January to March. August also shows considerably high occupancy levels, but they tend to decrease as we approach December.

Therefore, the busiest times of the year to visit Seattle are the beginning of January, April and July.

Do the holydays influence the AirBnB availability?

Consider the following US Holidays in 2016:

If we identify the availability levels in the mentioned holidays:

We can see that Independence Day indicates a great level of unavailability. The other holidays don’t seem to highly influence occupancy.

What is the relationship between prices and availability?

Is it right to say that a city full of tourists is also a city where the hosting prices are the highest?

Let’s check!

Here we have the prices in 2016:

And here we look at the availability levels in 2016:

Surprisingly, the data suggests prices increase as we have more AirBnB availability! However, this is not a perfect correlation (see the code in my GitHub — link below — for technical details).

Do the review variables influence price?

A simple way to understand the relationship between the review variables and price is to look at the Correlation Matrices (again, take a look at my GitHub code!).

In general terms, each review variable is poorly correlated with price. Based on data, we can’t say that the review metrics influence prices.

Recap

A simple Exploratory Data Analysis and some Statistical tools provide us a straightforward way to plan a trip based on public data. This post showed how important questions are quickly answered from data using an analytical approach.

Your Turn!

Now it is your turn! Choose your next travel destination, gather public data and find the answers to the provided questions. You are free to take a look at my GitHub repository, also shared below. Have a (Covid-free) fun trip!

Code

Find the code here: https://github.com/crdealme/Seattle

--

--