Univariate and Multivariate Analysis of Food Delivery Service

| Data Science · Projects

Note: This is a project I completed for U.T. Austin’s Post Graduate Program in AI & Machine Learning: Business Applications. The data and business scenarios are part of a simulated case study and do not represent the actual operations of any real-world entity.


Executive Summary

Context

In the fast-paced environment of New York City, online food delivery services have become a staple for students and busy professionals. This analysis focuses on data from a major food aggregator that connects customers with multiple restaurants through a centralized smartphone app. The aggregator handles the end-to-end process: order placement, restaurant confirmation, delivery person assignment, and final drop-off.

The company generates revenue by collecting a fixed margin from restaurants on every order. To improve customer experience and business efficiency, this study analyzes order patterns, restaurant demand, and delivery logistics.

Objective

Data Dictionary


1. Environment Setup & Data Overview

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv('foodhub_order.csv')

# Initial Data Inspection
print(f"Dataset Shape: {df.shape}")
print(df.info())

# Checking for missing values
print("Missing Values:\n", df.isnull().sum())

Initial Observations


2. Univariate Analysis: Key Distributions

We explored individual variables to understand the underlying patterns in customer behavior and operational performance.

# Statistical summary of preparation time
df.describe()['food_preparation_time']

# Distribution of ratings
plt.figure(figsize=(6,5))
sns.countplot(data=df, x='rating', order=['3', '4', '5', 'Not given'])
plt.title('Distribution of Customer Ratings')
plt.show()

# Orders by Cuisine Type
plt.figure(figsize=(8,5))
sns.countplot(data=df, x='cuisine_type')
plt.title('Orders Per Cuisine Type')
plt.xticks(rotation=90)
plt.show()

Key Findings from Univariate Exploration


3. Multivariate Analysis: Understanding Correlations

To understand the drivers of cost and customer satisfaction, we analyzed the relationships between time, cost, and cuisine.

# Correlation Heatmap
plt.figure(figsize=(10,5))
sns.heatmap(df.corr(), annot=True, cmap='Spectral')
plt.title('Variable Correlation Matrix')
plt.show()

# Total Time Calculation
df['total_time'] = df['food_preparation_time'] + df['delivery_time']

# Cost by Cuisine Type
plt.figure(figsize=(10,5))
sns.boxplot(data=df, x='cuisine_type', y='cost_of_the_order')
plt.title('Order Cost Distribution by Cuisine')
plt.xticks(rotation=90)
plt.show()

Insights on Time and Cost


4. Strategic Business Questions

High-Value Restaurant Partners

The top 5 restaurants by order volume are:

  1. Shake Shack
  2. The Meatball Shop
  3. Blue Ribbon Sushi
  4. Blue Ribbon Fried Chicken
  5. Parm

Note: These 5 restaurants alone account for nearly one-third of the total order volume.

Promotional Eligibility

To qualify for a premium promotional offer, restaurants must have >50 ratings and an average rating >4. The qualifying restaurants are:


5. Conclusions & Strategic Recommendations

Conclusions

Recommendations

  1. Workplace Marketing: Implement a “Weekday Lunch” campaign targeting office buildings and co-working spaces to balance the weekend-heavy order load.
  2. Growth Support for Mid-Tier Restaurants: Create “How to Grow” playbooks for restaurants with high ratings but low order volume, sharing best practices from top performers like Shake Shack.
  3. Incentivized Ratings: Offer small delivery discounts or “loyalty points” to customers who provide ratings, specifically aiming to reduce the “Not given” segment.
  4. Expand Family-Sized Options: Since most orders are currently between $10–$25, there is an opportunity to market “Family Bundles” or “Party Platters” to increase the average order value (AOV) above the $40 mark.
  5. Data Enhancement: Future iterations of this analysis should include order dates and geographic data (Zip Codes) to perform seasonal trend analysis and identify neighborhood-specific demand clusters.