Sasidharan Sarvoy Sathiyamoorthy

Seeking entry level data analyst roles. 2 year United Kingdom work authorization available , via the High Potential Individual (HPI) visa category.

Pizza Runner SQL Case Study



Case study solutions were implemented on DB Fiddle platform using PostgreSQL v13. 

Introduction

Did you know that over 115 million kilograms of pizza is consumed daily worldwide??? (Well according to Wikipedia anyway…)

Danny was scrolling through his Instagram feed when something really caught his eye - “80s Retro Styling and Pizza Is The Future!”

Danny was sold on the idea, but he knew that pizza alone was not going to help him get seed funding to expand his new Pizza Empire - so he had one more genius idea to combine with it - he was going to Uberize it - and so Pizza Runner was launched!

Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.

Available Data

Because Danny had a few years of experience as a data scientist - he was very aware that data collection was going to be critical for his business’ growth.

He has prepared for us an entity relationship diagram of his database design but requires further assistance to clean his data and apply some basic calculations so he can better direct his runners and optimise Pizza Runner’s operations.

All datasets exist within the pizza_runner database schema - be sure to include this reference within your SQL scripts as you start exploring the data and answering the case study questions.

Case Study Questions

This case study has LOTS of questions - they are broken up by area of focus including:

  1. Pizza Metrics
  2. Runner and Customer Experience
  3. Ingredient Optimization
  4. Pricing and Ratings

Each of the following case study questions can be answered using a single SQL statement.

Again, there are many questions in this case study - please feel free to pick and choose which ones you’d like to try!

Before you start writing your SQL queries however - you might want to investigate the data, you may want to do something with some of those null values and data types in the customer_orders and runner_orders tables!

A. Pizza Metrics

  1. How many pizzas were ordered?
  2. How many unique customer orders were made?
  3. How many successful orders were delivered by each runner?
  4. How many of each type of pizza was delivered?
  5. How many Vegetarian and Meatlovers were ordered by each customer?
  6. What was the maximum number of pizzas delivered in a single order?
  7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
  8. How many pizzas were delivered that had both exclusions and extras?
  9. What was the total volume of pizzas ordered for each hour of the day?
  10. What was the volume of orders for each day of the week?

B. Runner and Customer Experience

  1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)
  2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
  3. Is there any relationship between the number of pizzas and how long the order takes to prepare?
  4. What was the average distance travelled for each customer?
  5. What was the difference between the longest and shortest delivery times for all orders?
  6. What was the average speed for each runner for each delivery and do you notice any trend for these values?
  7. What is the successful delivery percentage for each runner?
C. Ingredient Optimisation

1. What are the standard ingredients for each pizza?
2. What was the most commonly added extra?
3. What was the most common exclusion?
4. Generate an order item for each record in the customers_orders table in the format of one of the following:
  • Meat Lovers
  •  Meat Lovers - Exclude Beef
  • Meat Lovers - Extra Bacon
  • Meat Lovers - Exclude Cheese, Bacon- Extra Mushroom, Pepper
5. Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients
  •  For example: "Meat Lovers: 2xBacon, Beef, ... , Salami"
6. What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?
D. Pricing and Ratings

1 .If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?
2.  What if there was an additional $1 charge for any pizza extras?
Add cheese is $1 extra
3.  The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset - generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.
4.  Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?
  • customer_id
  • order_id
  • runner_id
  • rating
  • order_time
  • pickup_time
  • Time between order and pickup
  • Delivery duration
  • Average speed
  • Total number of pizzas

5.  If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometer traveled - how much money does Pizza Runner have left over after these deliveries?

Solutions

Creating the schema and data tables
CREATE SCHEMA pizza_runner;
SET search_path = pizza_runner;

DROP TABLE IF EXISTS runners;

CREATE TABLE runners (
  "runner_id" INTEGER,
  "registration_date" DATE
);

INSERT INTO runners
  ("runner_id", "registration_date")
VALUES
  (1, '2021-01-01'),
  (2, '2021-01-03'),
  (3, '2021-01-08'),
  (4, '2021-01-15');


DROP TABLE IF EXISTS customer_orders;

CREATE TABLE customer_orders (
  "order_id" INTEGER,
  "customer_id" INTEGER,
  "pizza_id" INTEGER,
  "exclusions" VARCHAR(4),
  "extras" VARCHAR(4),
  "order_time" TIMESTAMP
);

INSERT INTO customer_orders
  ("order_id", "customer_id", "pizza_id", "exclusions", "extras", "order_time")
VALUES
  ('1', '101', '1', '', '', '2020-01-01 18:05:02'),
  ('2', '101', '1', '', '', '2020-01-01 19:00:52'),
  ('3', '102', '1', '', '', '2020-01-02 23:51:23'),
  ('3', '102', '2', '', NULL, '2020-01-02 23:51:23'),
  ('4', '103', '1', '4', '', '2020-01-04 13:23:46'),
  ('4', '103', '1', '4', '', '2020-01-04 13:23:46'),
  ('4', '103', '2', '4', '', '2020-01-04 13:23:46'),
  ('5', '104', '1', 'null', '1', '2020-01-08 21:00:29'),
  ('6', '101', '2', 'null', 'null', '2020-01-08 21:03:13'),
  ('7', '105', '2', 'null', '1', '2020-01-08 21:20:29'),
  ('8', '102', '1', 'null', 'null', '2020-01-09 23:54:33'),
  ('9', '103', '1', '4', '1, 5', '2020-01-10 11:22:59'),
  ('10', '104', '1', 'null', 'null', '2020-01-11 18:34:49'),
  ('10', '104', '1', '2, 6', '1, 4', '2020-01-11 18:34:49');


DROP TABLE IF EXISTS runner_orders;

CREATE TABLE runner_orders (
  "order_id" INTEGER,
  "runner_id" INTEGER,
  "pickup_time" VARCHAR(19),
  "distance" VARCHAR(7),
  "duration" VARCHAR(10),
  "cancellation" VARCHAR(23)
);

INSERT INTO runner_orders
  ("order_id", "runner_id", "pickup_time", "distance", "duration", "cancellation")
VALUES
  ('1', '1', '2020-01-01 18:15:34', '20km', '32 minutes', ''),
  ('2', '1', '2020-01-01 19:10:54', '20km', '27 minutes', ''),
  ('3', '1', '2020-01-03 00:12:37', '13.4km', '20 mins', NULL),
  ('4', '2', '2020-01-04 13:53:03', '23.4', '40', NULL),
  ('5', '3', '2020-01-08 21:10:57', '10', '15', NULL),
  ('6', '3', 'null', 'null', 'null', 'Restaurant Cancellation'),
  ('7', '2', '2020-01-08 21:30:45', '25km', '25mins', 'null'),
  ('8', '2', '2020-01-10 00:15:02', '23.4 km', '15 minute', 'null'),
  ('9', '2', 'null', 'null', 'null', 'Customer Cancellation'),
  ('10', '1', '2020-01-11 18:50:20', '10km', '10minutes', 'null');


DROP TABLE IF EXISTS pizza_names;

CREATE TABLE pizza_names (
  "pizza_id" INTEGER,
  "pizza_name" TEXT
);

INSERT INTO pizza_names
  ("pizza_id", "pizza_name")
VALUES
  (1, 'Meatlovers'),
  (2, 'Vegetarian');


DROP TABLE IF EXISTS pizza_recipes;

CREATE TABLE pizza_recipes (
  "pizza_id" INTEGER,
  "toppings" TEXT
);

INSERT INTO pizza_recipes
  ("pizza_id", "toppings")
VALUES
  (1, '1, 2, 3, 4, 5, 6, 8, 10'),
  (2, '4, 6, 7, 9, 11, 12');


DROP TABLE IF EXISTS pizza_toppings;

CREATE TABLE pizza_toppings (
  "topping_id" INTEGER,
  "topping_name" TEXT
);

INSERT INTO pizza_toppings
  ("topping_id", "topping_name")
VALUES
  (1, 'Bacon'),
  (2, 'BBQ Sauce'),
  (3, 'Beef'),
  (4, 'Cheese'),
  (5, 'Chicken'),
  (6, 'Mushrooms'),
  (7, 'Onions'),
  (8, 'Pepperoni'),
  (9, 'Peppers'),
  (10, 'Salami'),
  (11, 'Tomatoes'),
  (12, 'Tomato Sauce');
Data Cleaning
Updating customer_orders tables. Assuming that missing entries in exclusions and extras indicates that customer did not choose this option, we will fill these will 'null' (varchar).
update customer_orders
set exclusions  = 'null'
where exclusions = '' or exclusions is null;
  
update customer_orders
set extras = 'null'
where extras = '' or extras is null;
  
Updating runner_order tables. We will extract only the numeric digits of the distance (km) and duration (mins) columns. Additionally we will also fill missing entries in cancellation column with 'null' (which indicates that the order was not cancelled).
update runner_orders
set duration = regexp_replace(duration, '[[:alpha:]]','', 'g')
where duration != 'null';
  
  
update runner_orders
set distance = regexp_replace(distance, '[[:alpha:]]','', 'g')
where distance != 'null';
  
update runner_orders
set cancellation = 'null'
where cancellation = '' or cancellation is null;
 
A. PIZZA METRICS
1. How many pizzas were ordered?
select count(pizza_id) as pizzas_ordered
from pizza_runner.customer_orders;
In total, 14 pizzas were ordered.
2. How many unique customer orders were made?
select count(distinct order_id) as unique_orders
from pizza_runner.customer_orders;
10 unique customer orders were placed in total.
3. How many successful orders were delivered by each runner?
select runner_id, count(*) as successful_orders
from pizza_runner.runner_orders
where cancellation = 'null'
group by runner_id;
Runner_id 1 was most successful with four orders, while runner_id 3 had just one successful order.
4. How many of each type was delivered?
select p.pizza_name, count(*) as delivered
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
inner join pizza_runner.pizza_names p
on c.pizza_id = p.pizza_id
where r.cancellation = 'null'
group by p.pizza_name;
Out of 12 successful deliveries, 9 are meatlovers and 3 are vegetarian.
5. How many Vegetarian and Meatlovers were ordered by each customer?
select customer_id, pizza_name, count(pizza_name) as number_of_orders
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
inner join pizza_runner.pizza_names p
on c.pizza_id = p.pizza_id
group by customer_id, pizza_name
order by customer_id, number_of_orders desc;
Customers 101,102,103 tend to order both meat and vegetarian pizzas. Customers 104 exclusively prefers meatlovers pizza, while customer 105 exclusively prefers vegetarian pizza.
6. What was the maximum number of pizzas delivered in a single order?
select c.order_id, count(*) as num_pizzas_delivered
from pizza_runner.customer_orders c
inner join pizza_runner. runner_orders r
on c.order_id = r.order_id
where r.cancellation = 'null'
group by c.order_id
order by num_pizzas_delivered desc
limit 1;
The maximum number of pizzas delivered in a single order is 3, via order id 4.
7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
with toppings as

(select customer_id, case when c.exclusions = 'null' and c.extras = 'null' then 'no_change'
else 'atleast_one_change' end as pizza_topping_changes
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
where r.cancellation = 'null')

select customer_id, pizza_topping_changes, count(pizza_topping_changes) as num
from toppings
group by customer_id, pizza_topping_changes
order by customer_id, pizza_topping_changes;
Out of the pizzas delivered, except for customers 101 and 104, all other customers had made at least one change to their pizzas.
8. How many pizzas were delivered that had both exclusions and extras?
with exl_extr as

(select case when c.exclusions != 'null' and c.extras != 'null' then 'yes'
else 'no' end as exclusions_extras
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
where r.cancellation = 'null')

select count(exclusions_Extras) as num_pizzas_with_both
from exl_extr
where exclusions_extras = 'yes';
Out of the delivered orders, only one pizza had both exclusions as well as extra topping modifications.
9. What was the total volume of pizzas ordered for each hour of the day?
select  extract(hour from order_time) as hour_of_day, count(*)
from pizza_runner.customer_orders
group by hour_of_day
order by hour_of_day;
Volume of pizzas ordered after 6 pm is greater than earlier time frames within the day.
10. What was the volume of orders for each day of the week?
select  to_char(order_time,'Day') as day_of_week, count(*)
from pizza_runner.customer_orders
group by day_of_week
order by day_of_week;
Volume by day of week
B. RUNNER AND CUSTOMER EXPERIENCE
1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)
select to_char(registration_date, 'W') as  week_num, count(runner_id)
from pizza_runner.runners
group by week_num
order by week_num;
2 riders signed up in the first week and 1 each rider each in the subsequent 2 weeks.
2. What was the average time in minutes it took for each runner to arrive at the Pizza runner HQ to pickup the order?
Since we do not have a time for when rider starts their journey to the Pizza runner HQ, we will make an assumption that they started the ride when the order was placed by the customer (order_time). The difference between the pickup time and order time is used as the ride duration of the runner.
with pickup_times as

(select *, round(cast(extract(epoch from cast(r.pickup_time as timestamp) - c.order_time)/60 as numeric),2) as mins_to_pickup
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
where r.cancellation = 'null')

select runner_id,round(avg(mins_to_pickup),2) as avg_order_pickup_time
from pickup_times
group by runner_id
order by runner_id;
On average, Rider 3 is the quickest to pick up the order, while rider 2 takes the longest.
3. Is there any relationship between the number of pizzas and how long the order takes to prepare?
Since we do not have when the order is completed, we can make an assumption that the order is picked up right when it is completed. So the pickup_time can be considered as the time when the order was prepared.
with prep as

(select c.order_id,c.pizza_id, round(cast(extract(epoch from cast(r.pickup_time as timestamp) - c.order_time)/60 as numeric),2) as prep_duration
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
where r.cancellation = 'null'
order by c.order_id)

select order_id, count(pizza_id) as num_pizzas, round(avg(prep_duration),2) as order_prep_duration_mins
from prep
group by order_id
order by order_prep_duration_mins desc;
The order that took the longest time to prepare was also the one with most number of pizzas.
4. What was the average distance travelled for each customer?
with dist as 

(select c.customer_id, r.order_id, round(avg(cast(r.distance as numeric)),2) as avg_dist_order_km
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
where r.cancellation = 'null'
group by c.customer_id, r.order_id)

select customer_id, round(avg(avg_dist_order_km),2) as avg_dist_customer_km
from dist
group by customer_id
order by customer_id;
The longest average trip distance was made for customer 105 and the shortest trip was for customer 104.
5. What was the difference between the longest and shortest delivery times for all orders?
select max(cast(duration as float)) - min(cast(duration as float)) as diff
from pizza_runner.runner_orders
where cancellation = 'null';
The difference between the longest and shortest delivery times is 30 mins.
6. What was the average speed for each runner for each delivery and do you notice any trend for these values?
select runner_id, order_id, distance, pickup_time, round(cast(distance as numeric)/(cast(duration as numeric)/60),2) as average_speed_kmph
from pizza_runner.runner_orders
where cancellation = 'null'
order by runner_id, average_speed_kmph desc;
Runners tend to deliver orders at shorter distances quicker.
7. What is the successful delivery percentage for each runner?
with cte_1 as
(select runner_id,count(order_id) as total_orders
from pizza_runner.runner_orders
group by runner_id), 

cte_2 as
(select runner_id, count(order_id) as successful_orders
from pizza_runner.runner_orders
where cancellation = 'null'
group by runner_id)

select cte_1.runner_id, (cast(cte_2.successful_orders as float) /  cast(cte_1.total_orders as float))*100 as success_percent
from cte_1
inner join cte_2
on cte_1.runner_id = cte_2.runner_id
order by cte_1.runner_id;
Rider 1 has 100 % successful delivery rate, while rider 3 has the least with 50 % delivery success rate.
C. INGREDIENT OPTIMISATION
1. What are the standard ingredients for each pizza ? 
with topping_info as 

(select n.pizza_name, unnest(string_to_array(r.toppings,',')) as topping_id
from pizza_runner.pizza_names n
inner join pizza_runner.pizza_recipes r
on n.pizza_id = r.pizza_id) 


select pizza_name, string_agg(t.topping_name, ',' order by t.topping_name) as all_toppings
from topping_info
inner join pizza_runner.pizza_toppings t
on t.topping_id = cast(topping_info.topping_id as integer)
group by pizza_name;
2. What was the most commonly added extra ? 
with extras as 

(select order_id, unnest(string_to_array(extras,',')) as topping_id
from pizza_runner.customer_orders
where extras != 'null')

select t.topping_name, count(t.topping_name) as num_times_added
from extras
inner join pizza_runner.pizza_toppings t
on t.topping_id = cast(extras.topping_id as integer)
group by t.topping_name 
order by num_times_added desc;
The most commonly added extra is Bacon, followed by Chicken and Cheese.
3. What was the most common exclusion ?
with exclusions as 

(select order_id, unnest(string_to_array(exclusions,',')) as topping_id
from pizza_runner.customer_orders
where exclusions != 'null')

select t.topping_name, count(t.topping_name) as num_times_excluded
from exclusions
inner join pizza_runner.pizza_toppings t
on t.topping_id = cast(exclusions.topping_id as integer)
group by t.topping_name 
order by num_times_excluded desc;
Cheese was the most common exclusion from orders, followed by Mushroom and BBQ Sauce.
4. Generate an order item for each record in the customers_orders table in the format of one of the following :

a. Meat Lovers
b. Meat Lovers - Exclude Beef
c. Meat Lovers - Extra Bacon
d. Meat Lovers - Exclude Cheese, Bacon- Extra Mushroom, Peppers
To Be Continued
5. Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a '2x' in front of any relevant ingredients. For example : "Meat Lovers : 2xBacon, Beef . . ., Salami"
To Be Continued
6. What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first ?
To Be Continued
Share

Tools
Translate to