Example Usage

Welcome to the salesanalyzer_mds package! This package is designed to help small-sized businesses analyze their retail sales data efficiently, without needing extensive data analytics expertise. If you’ve ever felt overwhelmed by tools like Pandas or Scikit-learn, or wished for more retail-specific functions, you’re in the right place.

In this notebook, we’ll walk through how to use the salesanalyzer_mds package to extract valuable insights from your sales data. We’ll demonstrate key functionalities using real-world examples, so you can start improving your business decisions right away!

Imports

Let us begin by setting up all our imports for this demonstration, which includes all 3 salesanalyzer_mds functions:

  • sales_summary_statistics: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance.

  • segment_revenue_share: Segments products into three categories: cheap, medium, expensive, based on price, and calculates their respective share in total revenue.

  • predict_sales: Predicts future sales based on the provided historical data and the target. sales_summary_statistics: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance.

import pandas as pd

from salesanalyzer_mds.sales_summary_statistics import sales_summary_statistics
from salesanalyzer_mds.segment_revenue_share import segment_revenue_share
from salesanalyzer_mds.predict_sales import predict_sales

Create a sample data

Next, let us create a sample data to work with.

Note: salesanalyzer_mds package is not limited to the sample data columns and can be customized to suit your specific requirements.

sample_data = pd.DataFrame({
    'InvoiceNo' : ['INV-240891','INV-240892', 'INV-240893', 'INV-240894', 'INV-240895', 'INV-240896', 'INV-240898'],
    'Description': ['Laptop', 'Headphones', 'Headphones', 'Monitor', 'Headphones', 'Laptop', 'Monitor'],
    'Quantity' : [2, 3, 1, 3, 5, 2, 1],
    'InvoiceDate' : ['2023-06-09', '2023-07-11', '2023-08-21', '2023-08-25', '2023-09-10', '2023-10-30', '2023-10-30'],
    'UnitPrice' : [1500, 300, 250, 500, 420, 2000, 700],
    'CustomerID' : [85732, 70179, 85673, 22367, 57682, 99123, 45612],
    'Country' : ['USA', 'Singapoore', 'Germany', 'USA', 'Geramny', 'Singapoore', 'USA']
})

sample_data.head()
InvoiceNo Description Quantity InvoiceDate UnitPrice CustomerID Country
0 INV-240891 Laptop 2 2023-06-09 1500 85732 USA
1 INV-240892 Headphones 3 2023-07-11 300 70179 Singapoore
2 INV-240893 Headphones 1 2023-08-21 250 85673 Germany
3 INV-240894 Monitor 3 2023-08-25 500 22367 USA
4 INV-240895 Headphones 5 2023-09-10 420 57682 Geramny

Get Summary Statistics

One of the key features of salesanalyzer_mds is its ability to quickly generate sales summary. Use the analyze_sales_trends() function to generate insights like total revenue, average order value, and top selling products.

Use help(sales_summary_statistics) for more information about the function

sales_summary_statistics(sample_data)
Value
total_revenue 12450.0
unique_customers 7
average_order_value 1778.571429
top_selling_product_quantity Headphones
top_selling_product_revenue Laptop
average_revenue_per_customer 1778.571429

Get Revenue Share for each Product Category

Another feature of saleanalyzer_mds, the segment_revenue_share() function, segments products into three categories (cheap < medium < expensive) — based on their price, and calculates the respective share of total revenue contributed by each segment. By default, the price thresholds are set automatically, but users can define custom thresholds to categorize products according to their specific business needs. This function is particularly useful for analyzing product sales data and understanding revenue distribution across different pricing tiers.

Use help(sales_summary_statistics) for more information about the function

# Using default price thresholds
revenue_share = segment_revenue_share(sample_data, price_col='UnitPrice', quantity_col='Quantity')
revenue_share
PriceSegment TotalRevenue RevenueShare (%)
0 cheap 1150 9.24
1 medium 4300 34.54
2 expensive 7000 56.22
# Using user-defined price thresholds
revenue_share = segment_revenue_share(sample_data, price_col='UnitPrice', quantity_col='Quantity', price_thresholds=(300, 500))
revenue_share
PriceSegment TotalRevenue RevenueShare (%)
0 cheap 1150 9.24
1 medium 3600 28.92
2 expensive 7700 61.85

Predict Future Sales

Now that you have a good summary of your past sales, say, you want to peek into the future and predict how your products will sell in a month, 2 months or even a year? You can do this with predict_sales() function. This function uses a Random Forest machine learning model to make predictions on your specified target (e.g. quantity sold). The output will be a data frame with predicted values, and the model’s performance score (Mean Squared Error and R Squared).

Important
predict_sales() checks for duplicate entries, and only considers unique data points
By default the function uses 70% data for training and 30% for testing, to change that you can pass test_size = 0.2 increase the ratio, if your data size is small

Model Performance Scores:

  • Mean Squared Error: average squared difference between predicted values and the actual values

  • Coefficient of Determination $(R^2)$: how well-observed results are reproduced by the model, depending on the ratio of total deviation of results described by the model.

new_data = pd.DataFrame({
    'InvoiceNo' : ['INV-250891','INV-250892'],
    'Description': ['Laptop', 'Headphones'],
    'InvoiceDate' : ['2025-01-30', '2025-02-01'],
    'UnitPrice' : [2000, 300],
    'CustomerID' : [85732, 70179],
    'Country' : ['USA', 'Singapoore']
})

predict_sales(sample_data, 
              new_data, 
              numeric_features = ['UnitPrice'], 
              categorical_features = ['Description', 'Country'], 
              target = 'Quantity', 
              date_feature = 'InvoiceDate')
MSE of the model: 6.7
R_squared of the model: -6.54
Predicted values
0 1.77
1 1.33

If you don’t want to include a date feature into your analysis, you can omit it from your arguments.

predict_sales(sample_data, new_data, ['UnitPrice'], ['Description', 'Country'], 'Quantity', test_size=0.2)
MSE of the model: 1.72
R_squared of the model: 0.0
Predicted values
0 1.89
1 1.88

This is the end of the tutorial, where you have seen how to get sales data insights using our package.