Example Usage

Welcome to the salesanalyzer_mds package! This package is designed to help small-sized businesses analyze their retail sales data efficiently, without needing extensive data analytics expertise. If you’ve ever felt overwhelmed by tools like Pandas or Scikit-learn, or wished for more retail-specific functions, you’re in the right place.

In this notebook, we’ll walk through how to use the salesanalyzer_mds package to extract valuable insights from your sales data. We’ll demonstrate key functionalities using real-world examples, so you can start improving your business decisions right away!

Imports

Let us begin by setting up all our imports for this demonstration, which includes all 3 salesanalyzer_mds functions:

sales_summary_statistics: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance.
segment_revenue_share: Segments products into three categories: cheap, medium, expensive, based on price, and calculates their respective share in total revenue.
predict_sales: Predicts future sales based on the provided historical data and the target. sales_summary_statistics: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance.

import pandas as pd

from salesanalyzer_mds.sales_summary_statistics import sales_summary_statistics
from salesanalyzer_mds.segment_revenue_share import segment_revenue_share
from salesanalyzer_mds.predict_sales import predict_sales

Create a sample data

Next, let us create a sample data to work with.

Note: salesanalyzer_mds package is not limited to the sample data columns and can be customized to suit your specific requirements.

sample_data = pd.DataFrame({
    'InvoiceNo' : ['INV-240891','INV-240892', 'INV-240893', 'INV-240894', 'INV-240895', 'INV-240896', 'INV-240898'],
    'Description': ['Laptop', 'Headphones', 'Headphones', 'Monitor', 'Headphones', 'Laptop', 'Monitor'],
    'Quantity' : [2, 3, 1, 3, 5, 2, 1],
    'InvoiceDate' : ['2023-06-09', '2023-07-11', '2023-08-21', '2023-08-25', '2023-09-10', '2023-10-30', '2023-10-30'],
    'UnitPrice' : [1500, 300, 250, 500, 420, 2000, 700],
    'CustomerID' : [85732, 70179, 85673, 22367, 57682, 99123, 45612],
    'Country' : ['USA', 'Singapoore', 'Germany', 'USA', 'Geramny', 'Singapoore', 'USA']
})

sample_data.head()

	InvoiceNo	Description	Quantity	InvoiceDate	UnitPrice	CustomerID	Country
0	INV-240891	Laptop	2	2023-06-09	1500	85732	USA
1	INV-240892	Headphones	3	2023-07-11	300	70179	Singapoore
2	INV-240893	Headphones	1	2023-08-21	250	85673	Germany
3	INV-240894	Monitor	3	2023-08-25	500	22367	USA
4	INV-240895	Headphones	5	2023-09-10	420	57682	Geramny

Get Summary Statistics

One of the key features of salesanalyzer_mds is its ability to quickly generate sales summary. Use the analyze_sales_trends() function to generate insights like total revenue, average order value, and top selling products.

Use help(sales_summary_statistics) for more information about the function

sales_summary_statistics(sample_data)

	Value
total_revenue	12450.0
unique_customers	7
average_order_value	1778.571429
top_selling_product_quantity	Headphones
top_selling_product_revenue	Laptop
average_revenue_per_customer	1778.571429

Get Revenue Share for each Product Category

Another feature of saleanalyzer_mds, the segment_revenue_share() function, segments products into three categories (cheap < medium < expensive) — based on their price, and calculates the respective share of total revenue contributed by each segment. By default, the price thresholds are set automatically, but users can define custom thresholds to categorize products according to their specific business needs. This function is particularly useful for analyzing product sales data and understanding revenue distribution across different pricing tiers.

Use help(sales_summary_statistics) for more information about the function

# Using default price thresholds
revenue_share = segment_revenue_share(sample_data, price_col='UnitPrice', quantity_col='Quantity')
revenue_share

	PriceSegment	TotalRevenue	RevenueShare (%)
0	cheap	1150	9.24
1	medium	4300	34.54
2	expensive	7000	56.22

# Using user-defined price thresholds
revenue_share = segment_revenue_share(sample_data, price_col='UnitPrice', quantity_col='Quantity', price_thresholds=(300, 500))
revenue_share

	PriceSegment	TotalRevenue	RevenueShare (%)
0	cheap	1150	9.24
1	medium	3600	28.92
2	expensive	7700	61.85

Predict Future Sales

Now that you have a good summary of your past sales, say, you want to peek into the future and predict how your products will sell in a month, 2 months or even a year? You can do this with predict_sales() function. This function uses a Random Forest machine learning model to make predictions on your specified target (e.g. quantity sold). The output will be a data frame with predicted values, and the model’s performance score (Mean Squared Error and R Squared).

Important
predict_sales() checks for duplicate entries, and only considers unique data points
By default the function uses 70% data for training and 30% for testing, to change that you can pass test_size = 0.2 increase the ratio, if your data size is small

Model Performance Scores:

Mean Squared Error: average squared difference between predicted values and the actual values

Coefficient of Determination $(R^2)$: how well-observed results are reproduced by the model, depending on the ratio of total deviation of results described by the model.

new_data = pd.DataFrame({
    'InvoiceNo' : ['INV-250891','INV-250892'],
    'Description': ['Laptop', 'Headphones'],
    'InvoiceDate' : ['2025-01-30', '2025-02-01'],
    'UnitPrice' : [2000, 300],
    'CustomerID' : [85732, 70179],
    'Country' : ['USA', 'Singapoore']
})

predict_sales(sample_data, 
              new_data, 
              numeric_features = ['UnitPrice'], 
              categorical_features = ['Description', 'Country'], 
              target = 'Quantity', 
              date_feature = 'InvoiceDate')

MSE of the model: 6.7
R_squared of the model: -6.54

	Predicted values
0	1.77
1	1.33

If you don’t want to include a date feature into your analysis, you can omit it from your arguments.

predict_sales(sample_data, new_data, ['UnitPrice'], ['Description', 'Country'], 'Quantity', test_size=0.2)

MSE of the model: 1.72
R_squared of the model: 0.0

	Predicted values
0	1.89
1	1.88

This is the end of the tutorial, where you have seen how to get sales data insights using our package.