{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Example Usage\n",
"\n",
"Welcome to the `salesanalyzer_mds` package! This package is designed to help small-sized businesses analyze their retail sales data efficiently, without needing extensive data analytics expertise. If you've ever felt overwhelmed by tools like Pandas or Scikit-learn, or wished for more retail-specific functions, you're in the right place.\n",
"\n",
"In this notebook, we'll walk through how to use the `salesanalyzer_mds` package to extract valuable insights from your sales data. We’ll demonstrate key functionalities using real-world examples, so you can start improving your business decisions right away!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports\n",
"\n",
"Let us begin by setting up all our imports for this demonstration, which includes all 3 `salesanalyzer_mds` functions:\n",
"- `sales_summary_statistics`: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance.\n",
"- `segment_revenue_share`: Segments products into three categories: cheap, medium, expensive, based on price, and calculates their respective share in total revenue.\n",
"- `predict_sales`: Predicts future sales based on the provided historical data and the target.\n",
"sales_summary_statistics: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"from salesanalyzer_mds.sales_summary_statistics import sales_summary_statistics\n",
"from salesanalyzer_mds.segment_revenue_share import segment_revenue_share\n",
"from salesanalyzer_mds.predict_sales import predict_sales"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a sample data\n",
"\n",
"Next, let us create a sample data to work with. \n",
"> Note:\n",
"> `salesanalyzer_mds` package is not limited to the sample data columns and can be customized to suit your specific requirements."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" InvoiceNo | \n",
" Description | \n",
" Quantity | \n",
" InvoiceDate | \n",
" UnitPrice | \n",
" CustomerID | \n",
" Country | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" INV-240891 | \n",
" Laptop | \n",
" 2 | \n",
" 2023-06-09 | \n",
" 1500 | \n",
" 85732 | \n",
" USA | \n",
"
\n",
" \n",
" | 1 | \n",
" INV-240892 | \n",
" Headphones | \n",
" 3 | \n",
" 2023-07-11 | \n",
" 300 | \n",
" 70179 | \n",
" Singapoore | \n",
"
\n",
" \n",
" | 2 | \n",
" INV-240893 | \n",
" Headphones | \n",
" 1 | \n",
" 2023-08-21 | \n",
" 250 | \n",
" 85673 | \n",
" Germany | \n",
"
\n",
" \n",
" | 3 | \n",
" INV-240894 | \n",
" Monitor | \n",
" 3 | \n",
" 2023-08-25 | \n",
" 500 | \n",
" 22367 | \n",
" USA | \n",
"
\n",
" \n",
" | 4 | \n",
" INV-240895 | \n",
" Headphones | \n",
" 5 | \n",
" 2023-09-10 | \n",
" 420 | \n",
" 57682 | \n",
" Geramny | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" InvoiceNo Description Quantity InvoiceDate UnitPrice CustomerID \\\n",
"0 INV-240891 Laptop 2 2023-06-09 1500 85732 \n",
"1 INV-240892 Headphones 3 2023-07-11 300 70179 \n",
"2 INV-240893 Headphones 1 2023-08-21 250 85673 \n",
"3 INV-240894 Monitor 3 2023-08-25 500 22367 \n",
"4 INV-240895 Headphones 5 2023-09-10 420 57682 \n",
"\n",
" Country \n",
"0 USA \n",
"1 Singapoore \n",
"2 Germany \n",
"3 USA \n",
"4 Geramny "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sample_data = pd.DataFrame({\n",
" 'InvoiceNo' : ['INV-240891','INV-240892', 'INV-240893', 'INV-240894', 'INV-240895', 'INV-240896', 'INV-240898'],\n",
" 'Description': ['Laptop', 'Headphones', 'Headphones', 'Monitor', 'Headphones', 'Laptop', 'Monitor'],\n",
" 'Quantity' : [2, 3, 1, 3, 5, 2, 1],\n",
" 'InvoiceDate' : ['2023-06-09', '2023-07-11', '2023-08-21', '2023-08-25', '2023-09-10', '2023-10-30', '2023-10-30'],\n",
" 'UnitPrice' : [1500, 300, 250, 500, 420, 2000, 700],\n",
" 'CustomerID' : [85732, 70179, 85673, 22367, 57682, 99123, 45612],\n",
" 'Country' : ['USA', 'Singapoore', 'Germany', 'USA', 'Geramny', 'Singapoore', 'USA']\n",
"})\n",
"\n",
"sample_data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get Summary Statistics\n",
"\n",
"One of the key features of `salesanalyzer_mds` is its ability to quickly generate sales summary. Use the `analyze_sales_trends()` function to generate insights like total revenue, average order value, and top selling products.\n",
"> Use help(sales_summary_statistics) for more information about the function"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Value | \n",
"
\n",
" \n",
" \n",
" \n",
" | total_revenue | \n",
" 12450.0 | \n",
"
\n",
" \n",
" | unique_customers | \n",
" 7 | \n",
"
\n",
" \n",
" | average_order_value | \n",
" 1778.571429 | \n",
"
\n",
" \n",
" | top_selling_product_quantity | \n",
" Headphones | \n",
"
\n",
" \n",
" | top_selling_product_revenue | \n",
" Laptop | \n",
"
\n",
" \n",
" | average_revenue_per_customer | \n",
" 1778.571429 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Value\n",
"total_revenue 12450.0\n",
"unique_customers 7\n",
"average_order_value 1778.571429\n",
"top_selling_product_quantity Headphones\n",
"top_selling_product_revenue Laptop\n",
"average_revenue_per_customer 1778.571429"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales_summary_statistics(sample_data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get Revenue Share for each Product Category\n",
"\n",
"Another feature of `saleanalyzer_mds`, the `segment_revenue_share()` function, segments products into three categories (cheap < medium < expensive) — based on their price, and calculates the respective share of total revenue contributed by each segment. By default, the price thresholds are set automatically, but users can define custom thresholds to categorize products according to their specific business needs. This function is particularly useful for analyzing product sales data and understanding revenue distribution across different pricing tiers.\n",
"> Use help(sales_summary_statistics) for more information about the function"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PriceSegment | \n",
" TotalRevenue | \n",
" RevenueShare (%) | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" cheap | \n",
" 1150 | \n",
" 9.24 | \n",
"
\n",
" \n",
" | 1 | \n",
" medium | \n",
" 4300 | \n",
" 34.54 | \n",
"
\n",
" \n",
" | 2 | \n",
" expensive | \n",
" 7000 | \n",
" 56.22 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PriceSegment TotalRevenue RevenueShare (%)\n",
"0 cheap 1150 9.24\n",
"1 medium 4300 34.54\n",
"2 expensive 7000 56.22"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Using default price thresholds\n",
"revenue_share = segment_revenue_share(sample_data, price_col='UnitPrice', quantity_col='Quantity')\n",
"revenue_share"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PriceSegment | \n",
" TotalRevenue | \n",
" RevenueShare (%) | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" cheap | \n",
" 1150 | \n",
" 9.24 | \n",
"
\n",
" \n",
" | 1 | \n",
" medium | \n",
" 3600 | \n",
" 28.92 | \n",
"
\n",
" \n",
" | 2 | \n",
" expensive | \n",
" 7700 | \n",
" 61.85 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PriceSegment TotalRevenue RevenueShare (%)\n",
"0 cheap 1150 9.24\n",
"1 medium 3600 28.92\n",
"2 expensive 7700 61.85"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Using user-defined price thresholds\n",
"revenue_share = segment_revenue_share(sample_data, price_col='UnitPrice', quantity_col='Quantity', price_thresholds=(300, 500))\n",
"revenue_share"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Predict Future Sales\n",
"\n",
"Now that you have a good summary of your **past** sales, say, you want to peek into the **future** and predict how your products will sell in a month, 2 months or even a year? You can do this with `predict_sales()` function. This function uses a Random Forest machine learning model to make predictions on your specified target (e.g. quantity sold). The output will be a data frame with predicted values, and the model's performance score (Mean Squared Error and R Squared).\n",
"\n",
"> **Important**
\n",
"> `predict_sales()` checks for duplicate entries, and only considers unique data points
\n",
"> By default the function uses 70% data for training and 30% for testing, to change that you can pass test_size = 0.2 increase the ratio, if your data size is small \n",
"
\n",
"> **Model Performance Scores:**
\n",
"> - Mean Squared Error: average squared difference between predicted values and the actual values \n",
"> - Coefficient of Determination $(R^2)$: how well-observed results are reproduced by the model, depending on the ratio of total deviation of results described by the model.\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MSE of the model: 6.7\n",
"R_squared of the model: -6.54\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Predicted values | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1.77 | \n",
"
\n",
" \n",
" | 1 | \n",
" 1.33 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Predicted values\n",
"0 1.77\n",
"1 1.33"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_data = pd.DataFrame({\n",
" 'InvoiceNo' : ['INV-250891','INV-250892'],\n",
" 'Description': ['Laptop', 'Headphones'],\n",
" 'InvoiceDate' : ['2025-01-30', '2025-02-01'],\n",
" 'UnitPrice' : [2000, 300],\n",
" 'CustomerID' : [85732, 70179],\n",
" 'Country' : ['USA', 'Singapoore']\n",
"})\n",
"\n",
"predict_sales(sample_data, \n",
" new_data, \n",
" numeric_features = ['UnitPrice'], \n",
" categorical_features = ['Description', 'Country'], \n",
" target = 'Quantity', \n",
" date_feature = 'InvoiceDate')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you don't want to include a date feature into your analysis, you can omit it from your arguments.\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MSE of the model: 1.72\n",
"R_squared of the model: 0.0\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Predicted values | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1.89 | \n",
"
\n",
" \n",
" | 1 | \n",
" 1.88 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Predicted values\n",
"0 1.89\n",
"1 1.88"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predict_sales(sample_data, new_data, ['UnitPrice'], ['Description', 'Country'], 'Quantity', test_size=0.2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is the end of the tutorial, where you have seen how to get sales data insights using our package."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "salesanalyzser",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}