Example Usage
Welcome to the salesanalyzer_mds package! This package is designed to help small-sized businesses analyze their retail sales data efficiently, without needing extensive data analytics expertise. If you’ve ever felt overwhelmed by tools like Pandas or Scikit-learn, or wished for more retail-specific functions, you’re in the right place.
In this notebook, we’ll walk through how to use the salesanalyzer_mds package to extract valuable insights from your sales data. We’ll demonstrate key functionalities using real-world examples, so you can start improving your business decisions right away!
Imports
Let us begin by setting up all our imports for this demonstration, which includes all 3 salesanalyzer_mds functions:
sales_summary_statistics: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance.segment_revenue_share: Segments products into three categories: cheap, medium, expensive, based on price, and calculates their respective share in total revenue.predict_sales: Predicts future sales based on the provided historical data and the target. sales_summary_statistics: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance.
import pandas as pd
from salesanalyzer_mds.sales_summary_statistics import sales_summary_statistics
from salesanalyzer_mds.segment_revenue_share import segment_revenue_share
from salesanalyzer_mds.predict_sales import predict_sales
Create a sample data
Next, let us create a sample data to work with.
Note:
salesanalyzer_mdspackage is not limited to the sample data columns and can be customized to suit your specific requirements.
sample_data = pd.DataFrame({
'InvoiceNo' : ['INV-240891','INV-240892', 'INV-240893', 'INV-240894', 'INV-240895', 'INV-240896', 'INV-240898'],
'Description': ['Laptop', 'Headphones', 'Headphones', 'Monitor', 'Headphones', 'Laptop', 'Monitor'],
'Quantity' : [2, 3, 1, 3, 5, 2, 1],
'InvoiceDate' : ['2023-06-09', '2023-07-11', '2023-08-21', '2023-08-25', '2023-09-10', '2023-10-30', '2023-10-30'],
'UnitPrice' : [1500, 300, 250, 500, 420, 2000, 700],
'CustomerID' : [85732, 70179, 85673, 22367, 57682, 99123, 45612],
'Country' : ['USA', 'Singapoore', 'Germany', 'USA', 'Geramny', 'Singapoore', 'USA']
})
sample_data.head()
| InvoiceNo | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
|---|---|---|---|---|---|---|---|
| 0 | INV-240891 | Laptop | 2 | 2023-06-09 | 1500 | 85732 | USA |
| 1 | INV-240892 | Headphones | 3 | 2023-07-11 | 300 | 70179 | Singapoore |
| 2 | INV-240893 | Headphones | 1 | 2023-08-21 | 250 | 85673 | Germany |
| 3 | INV-240894 | Monitor | 3 | 2023-08-25 | 500 | 22367 | USA |
| 4 | INV-240895 | Headphones | 5 | 2023-09-10 | 420 | 57682 | Geramny |
Get Summary Statistics
One of the key features of salesanalyzer_mds is its ability to quickly generate sales summary. Use the analyze_sales_trends() function to generate insights like total revenue, average order value, and top selling products.
Use help(sales_summary_statistics) for more information about the function
sales_summary_statistics(sample_data)
| Value | |
|---|---|
| total_revenue | 12450.0 |
| unique_customers | 7 |
| average_order_value | 1778.571429 |
| top_selling_product_quantity | Headphones |
| top_selling_product_revenue | Laptop |
| average_revenue_per_customer | 1778.571429 |
Predict Future Sales
Now that you have a good summary of your past sales, say, you want to peek into the future and predict how your products will sell in a month, 2 months or even a year? You can do this with predict_sales() function. This function uses a Random Forest machine learning model to make predictions on your specified target (e.g. quantity sold). The output will be a data frame with predicted values, and the model’s performance score (Mean Squared Error and R Squared).
Important
predict_sales()checks for duplicate entries, and only considers unique data points
By default the function uses 70% data for training and 30% for testing, to change that you can pass test_size = 0.2 increase the ratio, if your data size is small
Model Performance Scores:
Mean Squared Error: average squared difference between predicted values and the actual values
Coefficient of Determination $(R^2)$: how well-observed results are reproduced by the model, depending on the ratio of total deviation of results described by the model.
new_data = pd.DataFrame({
'InvoiceNo' : ['INV-250891','INV-250892'],
'Description': ['Laptop', 'Headphones'],
'InvoiceDate' : ['2025-01-30', '2025-02-01'],
'UnitPrice' : [2000, 300],
'CustomerID' : [85732, 70179],
'Country' : ['USA', 'Singapoore']
})
predict_sales(sample_data,
new_data,
numeric_features = ['UnitPrice'],
categorical_features = ['Description', 'Country'],
target = 'Quantity',
date_feature = 'InvoiceDate')
MSE of the model: 6.7
R_squared of the model: -6.54
| Predicted values | |
|---|---|
| 0 | 1.77 |
| 1 | 1.33 |
If you don’t want to include a date feature into your analysis, you can omit it from your arguments.
predict_sales(sample_data, new_data, ['UnitPrice'], ['Description', 'Country'], 'Quantity', test_size=0.2)
MSE of the model: 1.72
R_squared of the model: 0.0
| Predicted values | |
|---|---|
| 0 | 1.89 |
| 1 | 1.88 |
This is the end of the tutorial, where you have seen how to get sales data insights using our package.