salesanalyzer_mds.sales_summary_statistics
Functions
|
Generate summary statistics for sales data, including total revenue, unique customers, |
Module Contents
- salesanalyzer_mds.sales_summary_statistics.sales_summary_statistics(sales_data: pandas.DataFrame, quantity_col: str = 'Quantity', price_col: str = 'UnitPrice', customer_col: str = 'CustomerID', invoice_col: str = 'InvoiceNo', description_col: str = 'Description') pandas.DataFrame[source]
Generate summary statistics for sales data, including total revenue, unique customers, average order value, top-selling products by quantity and revenue, and average revenue per customer.
Args:
- sales_data (pandas.DataFrame): A DataFrame containing sales data with at least
the following columns: quantity_col, price_col, customer_col, invoice_col, and description_col.
quantity_col (str): The name of the column containing the quantity sold. price_col (str): The name of the column containing the unit price of the product. customer_col (str): The name of the column containing the customer ID. invoice_col (str): The name of the column containing the invoice number. description_col (str): The name of the column containing the product description.
Returns:
- pandas.DataFrame: A DataFrame containing the calculated summary statistics. If
no sales data is provided, returns an empty DataFrame.
The function computes the following statistics: - ‘total_revenue’: The total revenue generated by all sales. - ‘unique_customers’: The number of unique customers. - ‘average_order_value’: The average value of an order (sum of revenue per invoice). - ‘top_selling_product_quantity’: The product with the highest quantity sold. - ‘top_selling_product_revenue’: The product with the highest total revenue. - ‘average_revenue_per_customer’: The average revenue generated by each customer.
Example:
>>> df = pd.DataFrame({ >>> 'Quantity': [10, 5, 3, 15], >>> 'UnitPrice': [100, 200, 150, 100], >>> 'CustomerID': [1, 2, 1, 3], >>> 'InvoiceNo': ['INV001', 'INV002', 'INV003', 'INV004'], >>> 'Description': ['Product A', 'Product B', 'Product A', 'Product C'] >>> }) >>> summary_df = sales_summary_statistics(df) >>> print(summary_df) total_revenue unique_customers average_order_value 0 3950.0 3 987.5
top_selling_product_quantity top_selling_product_revenue 0 Product C Product C
average_revenue_per_customer
0 1316.666667