Analysis

BizStats.AI carry out Exploratory Data Analysis (EDA) from any given or asked search query. Generally, the data analysis process is evaluating data with analytical and logical reasoning by examining each component of the given dataset. To provide the best possible analysis for any business related queries, from various connected data sources. Gather, review, and then analyze these data to form some sort of finding or conclusion to make the data-driven decisions process easy.

In current business, data analysis is playing a vital role in making decisions more scientific and helping the business to achieve effective operation by giving discovering useful information, informing conclusions, and supporting decision-making.

Analysis Report Section

Any query asked by the user will be answered with multiple reports sections. For each combination, BizStarts.AI will provide possible analysis options as follows. Each section with a unique combination of data elements like  1.Single measure and a single attribute. 2. Multiple measures and a single attribute. 3. Single measure and multiple attributes. 4.Multiple measures and multiple attributes.

Figure: Report section for given search query

Report section is unique combination of measures and dimensions that is very relevant to the  asked query.  By selecting one of the report section from the listed sections, BizStarts.AI will provide possible analysis options as follows. 

List of analysis

Currently, BizStats.AI provides some of the basic lists of analysis as follows. This list of analysis will be updated time to time with respect to the selected industry, functional area and knowledge base while creating the project. From the analysis section, the analysis drop down shows the possible list of analysis. Choose any one of the analysis to get the Exploratory Data Analysis (EDA) for the respective analysis.

Figure: Analysis list drop down in the report section

 

Statistical Analysis

To provide, as much as possible statistical information of the selected report section, there are 3 report contents naming:

  1. The basic statistic
  2. Compare standard normal distributions with z scores
  3. Compare  standard normal distributions

Basic statistics

This report will show the basic statistic values such as Number of rows,  

  1. Unique count,  
  2. Min, 
  3. Max, 
  4. Mean  
  5. standard deviation

for each columns Of that report section data set. It is like statistical description. 

Compare standard normal distributions with z scores

This report will show the standard normal distribution curve of the one or more measure/s in that report section with the Z- scores. Existence of  more than one measure in the selected report section will lead to compare between the measures. Z-score placed in x-axis and probability density function that measure placed in y-axis.

A value from any normal distribution can be transformed into its corresponding value on a standard normal distribution using the following formula:

Z = (X - μ)/σ
where
Z  - value on the standard normal distribution,
X  - value on the original distribution,
μ  - mean of the original distribution,
σ -  standard deviation of the original distribution.


Compare  standard normal distributions

This report will show the standard normal distribution curve for the given measure. Measure range placed in x-axis and probability density function that measure placed in the y-axis. more than one measure in the selected report section will lead to comparing between the measures.
 

Figure: Statistical analysis

 

Histogram 

Histogram shows the underlying frequency of any continuous or series data to understand the whole data better. This will allow to inspect the data in which the distribution falls most. Discover the outliers, skewness, etc. 

Histogram visualize the distribution of data over a continuous interval or certain time period or range of amount. Each bar in a histogram represents the frequency of each bin or interval. This analysis gives an estimation of where the values are concentrated, what the extremes are and whether there are any gaps or unusual values. They are also useful for giving a rough view of the probability distribution.

From the selected report section choose the analysis drop down and select Histogram will display the following analysis in BizStats.AI. In this case, only one measure is available for more than one measure case number of the histogram will be presented. Choosing the number of bins also possible.

Figure: Histogram analysis

 

Coverage Analysis

Coverage analysis is the actual analysis of the selected report section. This report section will give a clear picture of one or more measure(s) data with its respective one or more dimension(s) attribute data are places or grouped.   

Time dimension reports

Bizstats will automatically transform the time dimension as the year, quarter, month, week and day analysis. For these, each criterion depends on the size of the selected data (number of rows and columns ) visualization perspectives are determined.

From the selected report section choose the analysis drop down and select Coverage will display the following analysis in BizStats.AI. This is going to be a detailed analysis of the selected data. Choose the perspective drop down under this analysis will give more possibilities of different pivot tables and different visualizations. 

Learn more about visualization options here

Figure: Coverage analysis with list of perspective in the drop down

 

Anomaly Detection

Anomaly detection to identify unusual patterns or outliers for the given data. Identification of rare or abnormal items, events or observations by differing significantly from the majority of the data, which raise suspicions or not to consider for further analysis. 

Some businesses use cases are,

  1. In network traffic identifying some strange patterns that may be the signal for a hack
  2. spotting an abnormally low sales on a particular store in a week
  3. Fraud detection in credit card transactions 

Currently, Interquartile Range (IQR) technique is listed in this analysis. We are working on  to add more Machine Learning-Based Approaches like Density-Based Anomaly Detection, Clustering-Based Anomaly Detection, Support Vector Machine-Based Anomaly Detection.

The InterQuartile Range (IQR) will detect the presence of outliers. Outliers mean data values that fall outside of the overall pattern of most of the common data fall.IQR is using the rule to consider any data point is actually an outlier.

The Interquartile Range

Form the given dataset can be described into following details

  1. minimum, or lowest value of the dataset.
  2. Q1 The first quartile - first quarter from the full data of the given dataset.
  3. Q2 The median of the data set - midpoint from the full data of the given dataset.
  4. Q3 The third quartile - three-quarters of the way through the list of all the data
  5. The maximum, or highest value of the data set.

The InterQuartile Range (IQR), From the above details, we can derive 

IQR = Q3 – Q1

The interquartile range shows how the data is spread about the median. It is less susceptible than the range to outliers.

Rule for Outliers

Calculate the interquartile range (IQR) using the following step.
Lower range  =  Q1  value - ( IQR value x 1.5 )
Upper range  = Q3  value + ( IQR value x 1.5 )

values below this lower range and above the upper range are considered as outliers of the given data set. In this analysis All represents the whole data placement (color:orange) from that using IQR technique finds inliers in green color, and the outliers are identified in red color. 

Figure: Anomaly detection using IQR method

 

Top 10 in terms of Measure

This analysis will show the top 10 data row in term of measure data values with the selected combination of dimension attributes. In this example, top value for a monthly product category is listed. Combining all the attributes sorting the measure column high to low then pick the top ten rows.  If more than one measure exists to do the same for each measure. 

This will give a clear insight of

  1. Which is the product category consistently performing?
  2. Is there any seasonality for any particular product category?
  3. Is the all top 10 sales fall into the same month?
  4. Is the all top 10 sales fall into the same product category?

Choose the perspective drop down under this analysis will give more possibilities of different pivot tables and different visualizations. Learn more about visualization options here    

Figure: Analysis of top 10 measures
logo