Online store sales Analysis in Python with the creation of an Analytical Report

Online store sales Analysis in Python with the creation of an Analytical Report

Most online store owners are dreaming of high traffic to their ventures. But not many succeeds. According to statistics, only 2-3% of the owners of online marketplaces succeed in making their sites permanently popular. The rest are satisfied with sporadic visits by visitors and as a consequence go bankrupt.

However, high online store traffic is only half the success. In addition, it is extremely important to build the right business processes so that the business is profitable and brings profit rather than loss.

Recently I did an analysis of one online store with high traffic but low revenue. And that would be a fairly interesting experience. I have prepared for business onwers both a detailed analysis using Pandas and Python, and a detailed 14-page analytical report. Below you will find its text in English and 2 links to the original report and code in Russian:

Analytical Report on the results of transaction analysis of the online store

This report has been prepared to inform management, investors and other stakeholders of the results of the analysis of the online store operations.

This analytical report aims to identify trends and patterns that can help in making strategic decisions and improve the efficiency of the business, in terms of increasing its profitability and overall efficiency.

Description of the data

The analysis involved a transaction report for the dates 2019-12-01 through 2020-12-09. It contained 54214 transaction rows and 8 columns:

  • InvoiceNo (i.e., order number);
  • StockCode (product code);
  • Description (product name);
  • Quantity (number of items sold);
  • InvoiceDate (date of invoice/order);
  • UnitPrice (price per unit);
  • CustomerID (ID of customer);
  • Country (country of purchase).

The report was in Excel format, which was not very convenient for a complete analysis and the necessary transformations. That’s why I decided to make it into a more lightweight csv format and analyze it using more powerful analytical tools: Pandas and Python.

I should note that although the data was quite complete, it was still quite “dirty” for a qualitative analysis. According to the results of the analysis of data quality, it was found that:

  • The report has missing values in 134 lines of the product description (“Descrpition” column);
  • InvoiceDate, UnitPrice and CustomerID have incorrect data types;
  • In addition UnitPrice column values has an incorrect separator type, while CustomerID has it for some reason.

As a result of transformations using Python manipulation I fixed all these errors.

In addition, there were anomalies in the original data with negative values in the Quantity column (number of units sold). There was even one row with a shipment value of -9600 pieces! Upon research, it turned out that this was how the write-offs were displayed. For the purposes of analysis this data was useless, but it “distorted” the statistics. So I decided to replace all the negative values with “zeros”.

Description of the approach to data analysis

To analyze the online store’s transactions, I decided to first conduct a general exploratory EDA analysis, and then on identifying some insights / patterns, to dive deeper into understanding what can be improved in the business’ performance.

This method involves cleaning and pre-processing the data, conducting exploratory data analysis, and then using various analytical techniques such as aggregation, grouping and visualization to identify trends, patterns and insights to make management decisions.

In addition, this method can include the application of statistical techniques, such as regression analysis and hypothesis testing, to better understand the factors that contribute to an online store’s sales and revenue growth.

The limitations of the data and the corresponding interpretations of the assumptions

While tools like Python and Pandas provide a high degree of flexibility in analyzing data, allowing you to slice the data in different ways to answer complex business questions, there are some limitations to consider when interpreting the resulting data:

  • When aggregating and grouping data, you may lose some accuracy because some values are simply ignored by Python and Pandas. This is usually because the data is missing or has an incorrect appearance/format. Typically, this loss of accuracy is not critical and is only a few percent or fractions of a percent.
  • When generating tables or graphs, some functions simplify the data and make the results easier to interpret. However, sometimes these assumptions can oversimplify the data and not reflect its complexity.
  • Inaccurate or incomplete data can lead to incorrect conclusions and decisions, which can have serious consequences. Therefore, the analyst’s 1st task is to properly clean and preprocess the data.
  • Any assumptions made when analyzing data with these tools should be tested and validated in advance on a small sample to make sure they are valid and appropriate in general.

Description of the business customer, his hypothetical pains and goals of business development

The client owns an online store that specializes in selling various home goods. The target audience for this online store is young people looking for inexpensive home accessories.

The key hypothetical pains of this business are how to find traffic (target visitors) for the online store and how to recoup the costs (investments) to attract this traffic. The third pain is logistics.

There is a lot of competition in this business. It is difficult for small online stores to compete with the huge marketplaces, which have virtually unlimited assortments and advertising budgets. Consequently, each visitor to the site/app is valuable, so each visitor should (ideally) be converted into a lead, and then into a customer, and then bring in as much revenue as possible.

Unlike large marketplaces, the owner of an online store can not endlessly expand the assortment. On the contrary, he must strive to differentiate his products from other stores and focus as much as possible on the most profitable positions.

Another significant problem a client may face is a lack of data to help him make informed business decisions. An online store owner has limited knowledge about the needs of his target audience, and it’s difficult for him to determine which products are in the best demand and should be stocked at a premium.

This is why it is valuable to know which products have the best turnover, which are average, and which are just lying in the warehouse and “freezing” the proceeds, or worse – “eating up” the advertising budget. Analysis of transactions in terms of turnover and likely trends in future purchases in this case is extremely important.

Description of the report stakeholders with justification of the selected metrics (metric system)

This analytical report of online store transactions will be useful to both the owner of an online store, its management, as well as product and marketing managers.

The key metrics that have been selected to evaluate the effectiveness of online store transactions:

  • Revenue (both total and for each product);
  • ROI (percentage of revenue per item);
  • Turnover of goods.

These metrics were chosen as key because, firstly, they are filled in as much detail as possible in the original data file, and secondly: they are key to understanding the “bottlenecks” of the business and the opportunities to expand them for faster growth.

Recommendations for the business customer data model

I will start by saying that this document describes the general conclusions of the analysis and recommendations. More detailed information is provided in the comments in the attached file with the Python code (

Since there was not much data on transactions initially, I had to further enrich them with various calculations, additional columns, and summary tables. This is what the source looked like with the data from the customer:

That is, only general transactional data was provided for analysis. This is enough for general research, but not for in-depth analysis of the business model.

This is what I recommend that you add in order to accurately “keep your hand on the pulse” and react in a timely manner if something goes wrong:

  • have not only the product description, but also a column with the product category;
  • have not only the country but also the city of shipment;
  • have not only the revenues, but also a breakdown of all costs of the sale operation (purchase costs, logistics, storage, advertising).

With this data, business owner can dive deeper into business performance analysis and understand what can be improved. In particular, by collecting data on all of the costs of operating an online store and understanding these costs, the customer can make informed decisions about pricing and marketing strategies, identify areas of inefficiency and optimize resources to improve profitability. It can also help identify the most cost-effective channels for advertising and optimize logistics to reduce shipping, handling fees and other costs.

Overall, my recommendation is to collect as much data as possible. The more there is, the more you will be able to see the weaknesses and strengths of the business, which will help in creating a more effective and profitable online store.

Formulated conclusions and recommendations for the business customer

As mentioned above: this report will give only general recommendations. I have left more detailed comments in the Python code ( For customers in Russian it may be more convenient to read the original report in Russian, which is uploaded on Google Docs:

1. Increase the average bill

The first thing that caught my eye when analyzing the data was the low average bill of sale. It is only 10.25 pounds, and that is due to the fact that the store has a few items in the range with a price of 800 or more pounds. In fact, the average bill is even lower – its median value is only 3 pounds!

The low average bill dramatically affects the overall profitability of the business and reduces its growth potential to zero, as more and more money has to be invested in logistics, instead of being invested in advertising and improving the efficiency of processes.

2. Reduce the assortment

The assortment line of the online store has 3,400 items, but only a few of them generate a decent profit.

I grouped similar products and plotted them against the revenue leaders. Judging by the chart below, only about a dozen items generated more than 5,000 pounds of revenue in a year of sales. That’s not much.

The current business model of this store is clearly focused more on turnover than on revenue. During the analyzed period 555980 items were sold, which is very decent. However, this volume of sales generated only 1056168 pounds of revenue, or 1.899 pounds per piece of merchandise sold.

I don’t know about the expense side of this business, but having experience analyzing other online stores I dare say that £1.899 per piece is right on the edge of profitability (if not a loss).

It’s not a good idea to give up drastically on the revolving model right away. But it is possible to reconsider the assortment. And first of all you should pay special attention (and advertising) to the most “hottest” items.

It is also worth “keeping in mind” that only 526 items out of 3,400 brought in more than 500 pounds of revenue for the year. This is a serious reason to think about optimizing the assortment.

Of course, the current business model, focused on turnover rather than revenue, is still viable, but for how long?

From the chart above, we can see that it probably makes no sense to leave goods that were sold in quantities of less than 100 pieces in the warehouse and in the storefront of the online store. In fact, the main revenue (88.5%) comes from the rest of the goods.

Analysis of the profitability (ROI) of product sales revealed the following list of leaders:

These products generate 34% or more of the revenue for each unit sold. Alas, they are in the minority. Most merchandise items have much worse ROI.

We see that almost 70% of goods in the assortment are essentially unprofitable, because they generate only 1-2% ROI or less. Obviously, this is a reason to review the assortment and optimize the business, leaving only the high-margin items.

However, it’s not that simple. When you reduce your assortment, you will inevitably see a reduction in overall revenue. And here’s why.

Obviously, the core of this business today is the sale of low-margin goods. To abandon them is to lose most of the revenue.

So what can be done to optimize the profitability of the business? As an option – to reduce the range not dramatically, but to 2500 of the most profitable products in terms of ROI and revenue. Then the sales statistics would look like this:

We see a 21% increase in average revenue per item and a 28.7% increase in ROI.

Will it reduce total revenue? Yes. But not significantly.

Despite the loss of 11% of the total revenue this decision looks quite rational – losing a little, we increase the margin of the business by as much as 20-30%. Then we can improve the marketing of the remaining positions and reach higher revenues.

You can find a file with the optimized assortment optimal.xlsx in the appendix to this report.

3. Optimization of logistics

Another of my recommendations is to optimize logistics based on the size of the turnover of unique items. It is worth giving priority to goods with a high turnover, giving them the best places in the warehouse, on low shelves, near the loading and unloading area. But goods with low turnover can even begin to sell and ship only by pre-order.

As the analysis showed only half of all 3,400 goods were sold in quantities greater than 49 pieces. That is, the rest – it’s just “ballast”, which only takes up space in the warehouse and “eats” the advertising budget.

Why am I so sure of this? Here’s why:

The first line is the goods sold in the volume of more than 165 pcs. The second line is more than 49 pcs. Below – anything less than that.

Why keep these positions in sales? There’s probably a reason for it (maybe they drive traffic, or competitors do it, etc.). But as an analyst, I think it’s at least worth taking a closer look at these products and figuring out: are they really necessary for your business?

In the appendix to this report you will find 2 files: goodturnover.xlsx (goods with good turnover – more than 49 pieces) and badturnover.xlsx (all others). I recommend analyzing the last file and sifting out all unnecessary items.


In my opinion, there are two key pains of this online store: a low average bill and a bloated assortment. Because of this, it has extremely low revenues for a huge turnover.

As for logistics – this is a headache for any online store with a large turnover. And although there is nothing about logistics in the analyzed data, I am almost certain that this organization has such problems, and they are quite acute.

Even if we look at the top 10 of all the best invoices for the year we can clearly see these pains:

Notice the UnitPrice column. We see that these are mostly items with a small price per piece. The most expensive item is the SET 3 WICKER OVAL BASKETS W LIDS at just £12.48 (which is extremely cheap by UK standards).

And that’s only 1 item. Mostly it’s items priced at 2-3-4 pounds. And if you look at the turnover leaders, there are a lot of “penny” goods:

You can see from the table above that 7 of the top 10 items by turnover checks are items priced under 1 pound apiece.

Finally, I want to mention briefly about other recommendations (which you will meet in the Python code). In particular, to analyze the table (topcustomers.xlsx) of the most loyal customers and work with them more closely (mailings, calls, etc.).

One last recommendation: try to smooth out the seasonality of your sales. The graph of revenue growth clearly demonstrates the seasonal nature of the cashflow.

How can you smooth out seasonality? There are a lot of ways: for example, giving users promotions on purchases of summer goods in winter, and vice versa, stimulating them with mailings.

In general, we see that the client has managed to build quite a successful Internet business, albeit with unevenly growing revenues, a bloated assortment and low profitability. But everything can be fixed. If he will follow the above recommendations, he will be able to turn this business into a more profitable and effective one.

Leave a Reply

Your email address will not be published. Required fields are marked *