Perhaps any taxi company wonders whether it’s worth it to spend more money on more expensive cars. On the one hand it seems like a bad idea, but the data from business analysis proves otherwise. In this paper, I did a business analysis of a well-known cab company in Russia and found an interesting pattern: the higher the class of car, the longer the trip, the higher the check will be. To analyze the data, I downloaded all of the company’s customer trips, divided them into classes, and did graphical and statistical analysis using Python. Links

The similarity or difference of two datasets can be determined using different methods. But perhaps the most accurate is the use of statistical methods. Many comparison criteria have been invented in statistics: different kinds of averages; percentiles and modas; different kinds of deviations, asymmetries; simple and confidant intervals; correlations of distributions; quantiles and quartiles; kurtosis of values, etc. This is all a huge area of data science and each of these parameters is calculated by its own formulas (which are often very difficult to understand). Fortunately, in today’s programs for data analysis, we don’t have to calculate them manually. And one of the most powerful and faithful data scientist assistants in this matter is Excel. At the link below you can see a research and comparison of two datasets (male and female weights) using Excel statistical tools. Dozens of statistical parameters were analyzed in this file. In addition, several distribution…

Read more

This is one of my first works on machine learning, but one of the most visual and easy to understand. In this study, I analyzed a simple dataset of Iris flowers, their properties and sizes. Then I built several prediction models for these parameters and chose the most optimal one. Instruments Google Collaboratory; Python and its libraries (Pandas, Scipy, Numpy, Matplotlib, Sklearn) Links

Today there are many metrics by which you can assess whether the site is healthy or sick. This is traffic, views per page, conversion rates, etc. But what if all these parameters are good and stable, but the profit from the Internet business, however, is steadily declining? Below I will share with you one very interesting case that I had the opportunity to analyze. This is a case about a successful online business which has been selling online education services for many years. However, something happened and the margin of its sales has fallen by almost 6 times in one year! To analyze the situation, I initially uploaded all the statistics from Google Analytics in Excel. Then I brought all the metrics into the proper form, calculated additional metrics, and created the necessary summary tables. Already at the analysis stage, I understood the reason. But to be sure, I decided…

Read more

Visualizations can be built at different levels, with different goals and for different executives. For example, for the sales director it is obvious that if the shipment of goods increases, then the revenue and profit also grows. However, any growth has its pitfalls. The dashboard below has some interesting observations. On the one hand the company is doing well, because it has: a growing branch network all over Western Europe; a steadily growing revenues; a steadily growing assortment. But… On the first chart ‘Profit by Products and Orders‘, we can see that there are products that are pulling business margins down. These are the Tables, Labels, Fasteners, Envelopes, Paper, Supplies. They sell in huge quantities, but generate very little profit. Another interesting chart is ‘Profitability of Country Markets‘. In it we can clearly see that the branches in the Netherlands, Denmark, Ireland, Sweden and Portugal are not profitable. Moreover, the…

Read more

The player performance analysis is one of the great examples of the use of Data Science in sports analytics. Cristiano Ronaldo is one of the most famous soccer players in the world. Therefore, it would be a great data science project if we analyze Cristiano Ronaldo’s performance in last years. First of all, I will say that I’ve expanded the concept of “last years” a little bit. When I started collecting data, I saw that Cristiano, who is 37 years old, doesn’t play very often now, and it’s not very interesting to analyze such statistics. So I decided to analyze the last 12 years of this player’s career, starting from 2010. Ronaldo had time to play for many clubs in 12 years: Real Madrid, Juventus, Manchester United. He played in 3 national championships and played a lot for his country’s national team. Over twelve years, this great soccer player played…

Read more

How to evaluate the effectiveness of money spent on advertising campaigns? Obviously, the easiest way to do this is with the help of a comprehensive dashboard. Good visualization allows you to understand in seconds whether the advertising campaign was successful or not. And in this case, hypothesis really proved to be true. Of course, visualizations also revealed leaders and outsiders; it also revealed bottlenecks and inefficient spending. But that’s the way it should be. That’s what analytics is all about. The PowerBI software used to prepare this dashboard is a powerful tool. It can turn boring and lifeless data in Excel into something more lively and illustrative. Unfortunately, the rights to work with this tool in Russia are severely limited. Many functions are disabled, including geographic maps. Therefore, I can share with you here only a link to a PDF presentation of the work I have done. Links StoresUral.pdf:

The goal of this work was to build interactive and detailed charts on how many cars there are in the regions of Russia. First of all, it was not an easy task. Because many of the data are now not published and were scattered on thousands of sites. I had to spend several hours collecting and systematizing them. But the result was worth it. Moreover, unlike many similar charts on the Web, I was also able to analyze not only the number of cars and their number per capita, but also such interesting parameters as: the impact of taxation on the number of cars; the impact of taxation on the choice of car brand; the impact of the region’s location on the choice of car brand. I’m sure you will find many of these charts extremely interesting. You certainly haven’t seen this in any of the major media articles. Instruments…

Read more

The purpose of this project was to analyze the possibilities of Python for researching stock prices. The legend of this study is that a certain investor wants to invest in Tesla shares because he sees a good upward trend on numerous stock websites. Since Python builds the charts itself, rather than taking them from websites, my goal was to build these charts in this environment. I also wanted to understand if these stocks were really trending upward and what the prospects were for future growth. Links Google Collaboratory:

The purpose of this work was to understand how profitable it would be for a retail company to open a new store. There were two Excel spreadsheets provided to analyze the data. The first table included historical cost data for the opening of existing stores. The second table had data on the likely costs of opening a new location. The client asked to compare all data both numerically and in percentage terms. He also asked to analyze and compare the costs of opening stores and provide them in an interactive Excel dashboard. I decided to make this analysis as visual as possible. For this purpose I used heat maps in tables, as well as a bar chart with a percentage distribution. Links Excel: Download the Excel spreadsheet

In this work, a detailed analysis of vacation spending was conducted, with calculations of correlation, deviations from the plan, and other statistical parameters. The tools used were: Google Sheets, Collaboratory, Python, Pandas. A dashboard was created in Google Data Studio. Initially, the data were entered into tables through an interactive Google form, and then all the required formulas were calculated on them, and then all this data was sent to the GDS dashboard. An interactive dashboard with all key metrics and sorts has been developed within the GDS environment. Then statistical dependencies were analyzed. Then several graphs and charts were built, including a distribution chart of the deviation of actual costs from the plan. Finally, the remaining calculations and combinations were done in Python. Links Google Forms: Google Sheets: Data Studio: Collaboratory: