Introduction
Did you know that over 115 million kilograms of pizza is consumed daily worldwide??? (Well according to Wikipedia anyway…)
Danny was scrolling through his Instagram feed when something really caught his eye - “80s Retro Styling and Pizza Is The Future!”
Danny was sold on the idea, but he knew that pizza alone was not going to help him get seed funding to expand his new Pizza Empire - so he had one more genius idea to combine with it - he was going to Uberize it - and so Pizza Runner was launched!
Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.
Available Data
Because Danny had a few years of experience as a data scientist - he was very aware that data collection was going to be critical for his business’ growth.
He has prepared for us an entity relationship diagram of his database design but requires further assistance to clean his data and apply some basic calculations so he can better direct his runners and optimise Pizza Runner’s operations.
All datasets exist within the pizza_runner database schema - be sure to include this reference within your SQL scripts as you start exploring the data and answering the case study questions.
Case Study Questions
This case study has LOTS of questions - they are broken up by area of focus including:
- Pizza Metrics
- Runner and Customer Experience
- Ingredient Optimization
- Pricing and Ratings
Each of the following case study questions can be answered using a single SQL statement.
Again, there are many questions in this case study - please feel free to pick and choose which ones you’d like to try!
Before you start writing your SQL queries however - you might want to investigate the data, you may want to do something with some of those null values and data types in the customer_orders and runner_orders tables!
A. Pizza Metrics
- How many pizzas were ordered?
- How many unique customer orders were made?
- How many successful orders were delivered by each runner?
- How many of each type of pizza was delivered?
- How many Vegetarian and Meatlovers were ordered by each customer?
- What was the maximum number of pizzas delivered in a single order?
- For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
- How many pizzas were delivered that had both exclusions and extras?
- What was the total volume of pizzas ordered for each hour of the day?
- What was the volume of orders for each day of the week?
B. Runner and Customer Experience
- How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)
- What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
- Is there any relationship between the number of pizzas and how long the order takes to prepare?
- What was the average distance travelled for each customer?
- What was the difference between the longest and shortest delivery times for all orders?
- What was the average speed for each runner for each delivery and do you notice any trend for these values?
- What is the successful delivery percentage for each runner?
1. What are the standard ingredients for each pizza?
2. What was the most commonly added extra?
3. What was the most common exclusion?
4. Generate an order item for each record in the customers_orders table in the format of one of the following:
- Meat Lovers
- Meat Lovers - Exclude Beef
- Meat Lovers - Extra Bacon
- Meat Lovers - Exclude Cheese, Bacon- Extra Mushroom, Pepper
- For example: "Meat Lovers: 2xBacon, Beef, ... , Salami"
1 .If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?
2. What if there was an additional $1 charge for any pizza extras?
Add cheese is $1 extra
3. The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset - generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.
4. Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?
- customer_id
- order_id
- runner_id
- rating
- order_time
- pickup_time
- Time between order and pickup
- Delivery duration
- Average speed
- Total number of pizzas
Solutions
CREATE SCHEMA pizza_runner;
SET search_path = pizza_runner;
DROP TABLE IF EXISTS runners;
CREATE TABLE runners (
"runner_id" INTEGER,
"registration_date" DATE
);
INSERT INTO runners
("runner_id", "registration_date")
VALUES
(1, '2021-01-01'),
(2, '2021-01-03'),
(3, '2021-01-08'),
(4, '2021-01-15');
DROP TABLE IF EXISTS customer_orders;
CREATE TABLE customer_orders (
"order_id" INTEGER,
"customer_id" INTEGER,
"pizza_id" INTEGER,
"exclusions" VARCHAR(4),
"extras" VARCHAR(4),
"order_time" TIMESTAMP
);
INSERT INTO customer_orders
("order_id", "customer_id", "pizza_id", "exclusions", "extras", "order_time")
VALUES
('1', '101', '1', '', '', '2020-01-01 18:05:02'),
('2', '101', '1', '', '', '2020-01-01 19:00:52'),
('3', '102', '1', '', '', '2020-01-02 23:51:23'),
('3', '102', '2', '', NULL, '2020-01-02 23:51:23'),
('4', '103', '1', '4', '', '2020-01-04 13:23:46'),
('4', '103', '1', '4', '', '2020-01-04 13:23:46'),
('4', '103', '2', '4', '', '2020-01-04 13:23:46'),
('5', '104', '1', 'null', '1', '2020-01-08 21:00:29'),
('6', '101', '2', 'null', 'null', '2020-01-08 21:03:13'),
('7', '105', '2', 'null', '1', '2020-01-08 21:20:29'),
('8', '102', '1', 'null', 'null', '2020-01-09 23:54:33'),
('9', '103', '1', '4', '1, 5', '2020-01-10 11:22:59'),
('10', '104', '1', 'null', 'null', '2020-01-11 18:34:49'),
('10', '104', '1', '2, 6', '1, 4', '2020-01-11 18:34:49');
DROP TABLE IF EXISTS runner_orders;
CREATE TABLE runner_orders (
"order_id" INTEGER,
"runner_id" INTEGER,
"pickup_time" VARCHAR(19),
"distance" VARCHAR(7),
"duration" VARCHAR(10),
"cancellation" VARCHAR(23)
);
INSERT INTO runner_orders
("order_id", "runner_id", "pickup_time", "distance", "duration", "cancellation")
VALUES
('1', '1', '2020-01-01 18:15:34', '20km', '32 minutes', ''),
('2', '1', '2020-01-01 19:10:54', '20km', '27 minutes', ''),
('3', '1', '2020-01-03 00:12:37', '13.4km', '20 mins', NULL),
('4', '2', '2020-01-04 13:53:03', '23.4', '40', NULL),
('5', '3', '2020-01-08 21:10:57', '10', '15', NULL),
('6', '3', 'null', 'null', 'null', 'Restaurant Cancellation'),
('7', '2', '2020-01-08 21:30:45', '25km', '25mins', 'null'),
('8', '2', '2020-01-10 00:15:02', '23.4 km', '15 minute', 'null'),
('9', '2', 'null', 'null', 'null', 'Customer Cancellation'),
('10', '1', '2020-01-11 18:50:20', '10km', '10minutes', 'null');
DROP TABLE IF EXISTS pizza_names;
CREATE TABLE pizza_names (
"pizza_id" INTEGER,
"pizza_name" TEXT
);
INSERT INTO pizza_names
("pizza_id", "pizza_name")
VALUES
(1, 'Meatlovers'),
(2, 'Vegetarian');
DROP TABLE IF EXISTS pizza_recipes;
CREATE TABLE pizza_recipes (
"pizza_id" INTEGER,
"toppings" TEXT
);
INSERT INTO pizza_recipes
("pizza_id", "toppings")
VALUES
(1, '1, 2, 3, 4, 5, 6, 8, 10'),
(2, '4, 6, 7, 9, 11, 12');
DROP TABLE IF EXISTS pizza_toppings;
CREATE TABLE pizza_toppings (
"topping_id" INTEGER,
"topping_name" TEXT
);
INSERT INTO pizza_toppings
("topping_id", "topping_name")
VALUES
(1, 'Bacon'),
(2, 'BBQ Sauce'),
(3, 'Beef'),
(4, 'Cheese'),
(5, 'Chicken'),
(6, 'Mushrooms'),
(7, 'Onions'),
(8, 'Pepperoni'),
(9, 'Peppers'),
(10, 'Salami'),
(11, 'Tomatoes'),
(12, 'Tomato Sauce');
update customer_orders
set exclusions = 'null'
where exclusions = '' or exclusions is null;
update customer_orders
set extras = 'null'
where extras = '' or extras is null;
update runner_orders
set duration = regexp_replace(duration, '[[:alpha:]]','', 'g')
where duration != 'null';
update runner_orders
set distance = regexp_replace(distance, '[[:alpha:]]','', 'g')
where distance != 'null';
update runner_orders
set cancellation = 'null'
where cancellation = '' or cancellation is null;
select count(pizza_id) as pizzas_ordered
from pizza_runner.customer_orders;
select count(distinct order_id) as unique_orders
from pizza_runner.customer_orders;
select runner_id, count(*) as successful_orders
from pizza_runner.runner_orders
where cancellation = 'null'
group by runner_id;
select p.pizza_name, count(*) as delivered
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
inner join pizza_runner.pizza_names p
on c.pizza_id = p.pizza_id
where r.cancellation = 'null'
group by p.pizza_name;
select customer_id, pizza_name, count(pizza_name) as number_of_orders
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
inner join pizza_runner.pizza_names p
on c.pizza_id = p.pizza_id
group by customer_id, pizza_name
order by customer_id, number_of_orders desc;
select c.order_id, count(*) as num_pizzas_delivered
from pizza_runner.customer_orders c
inner join pizza_runner. runner_orders r
on c.order_id = r.order_id
where r.cancellation = 'null'
group by c.order_id
order by num_pizzas_delivered desc
limit 1;
with toppings as
(select customer_id, case when c.exclusions = 'null' and c.extras = 'null' then 'no_change'
else 'atleast_one_change' end as pizza_topping_changes
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
where r.cancellation = 'null')
select customer_id, pizza_topping_changes, count(pizza_topping_changes) as num
from toppings
group by customer_id, pizza_topping_changes
order by customer_id, pizza_topping_changes;
with exl_extr as
(select case when c.exclusions != 'null' and c.extras != 'null' then 'yes'
else 'no' end as exclusions_extras
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
where r.cancellation = 'null')
select count(exclusions_Extras) as num_pizzas_with_both
from exl_extr
where exclusions_extras = 'yes';
select extract(hour from order_time) as hour_of_day, count(*)
from pizza_runner.customer_orders
group by hour_of_day
order by hour_of_day;
select to_char(order_time,'Day') as day_of_week, count(*)
from pizza_runner.customer_orders
group by day_of_week
order by day_of_week;
select to_char(registration_date, 'W') as week_num, count(runner_id)
from pizza_runner.runners
group by week_num
order by week_num;
with pickup_times as
(select *, round(cast(extract(epoch from cast(r.pickup_time as timestamp) - c.order_time)/60 as numeric),2) as mins_to_pickup
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
where r.cancellation = 'null')
select runner_id,round(avg(mins_to_pickup),2) as avg_order_pickup_time
from pickup_times
group by runner_id
order by runner_id;
with prep as
(select c.order_id,c.pizza_id, round(cast(extract(epoch from cast(r.pickup_time as timestamp) - c.order_time)/60 as numeric),2) as prep_duration
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
where r.cancellation = 'null'
order by c.order_id)
select order_id, count(pizza_id) as num_pizzas, round(avg(prep_duration),2) as order_prep_duration_mins
from prep
group by order_id
order by order_prep_duration_mins desc;
with dist as
(select c.customer_id, r.order_id, round(avg(cast(r.distance as numeric)),2) as avg_dist_order_km
from pizza_runner.customer_orders c
inner join pizza_runner.runner_orders r
on c.order_id = r.order_id
where r.cancellation = 'null'
group by c.customer_id, r.order_id)
select customer_id, round(avg(avg_dist_order_km),2) as avg_dist_customer_km
from dist
group by customer_id
order by customer_id;
select max(cast(duration as float)) - min(cast(duration as float)) as diff
from pizza_runner.runner_orders
where cancellation = 'null';
select runner_id, order_id, distance, pickup_time, round(cast(distance as numeric)/(cast(duration as numeric)/60),2) as average_speed_kmph
from pizza_runner.runner_orders
where cancellation = 'null'
order by runner_id, average_speed_kmph desc;
with cte_1 as
(select runner_id,count(order_id) as total_orders
from pizza_runner.runner_orders
group by runner_id),
cte_2 as
(select runner_id, count(order_id) as successful_orders
from pizza_runner.runner_orders
where cancellation = 'null'
group by runner_id)
select cte_1.runner_id, (cast(cte_2.successful_orders as float) / cast(cte_1.total_orders as float))*100 as success_percent
from cte_1
inner join cte_2
on cte_1.runner_id = cte_2.runner_id
order by cte_1.runner_id;
with topping_info as
(select n.pizza_name, unnest(string_to_array(r.toppings,',')) as topping_id
from pizza_runner.pizza_names n
inner join pizza_runner.pizza_recipes r
on n.pizza_id = r.pizza_id)
select pizza_name, string_agg(t.topping_name, ',' order by t.topping_name) as all_toppings
from topping_info
inner join pizza_runner.pizza_toppings t
on t.topping_id = cast(topping_info.topping_id as integer)
group by pizza_name;
with extras as
(select order_id, unnest(string_to_array(extras,',')) as topping_id
from pizza_runner.customer_orders
where extras != 'null')
select t.topping_name, count(t.topping_name) as num_times_added
from extras
inner join pizza_runner.pizza_toppings t
on t.topping_id = cast(extras.topping_id as integer)
group by t.topping_name
order by num_times_added desc;
with exclusions as
(select order_id, unnest(string_to_array(exclusions,',')) as topping_id
from pizza_runner.customer_orders
where exclusions != 'null')
select t.topping_name, count(t.topping_name) as num_times_excluded
from exclusions
inner join pizza_runner.pizza_toppings t
on t.topping_id = cast(exclusions.topping_id as integer)
group by t.topping_name
order by num_times_excluded desc;
a. Meat Lovers
b. Meat Lovers - Exclude Beef
c. Meat Lovers - Extra Bacon
d. Meat Lovers - Exclude Cheese, Bacon- Extra Mushroom, Peppers