 #### Statistical Methods in Valuation Analysis: Review of Principles and Applications(Part One of a Six-Part Series)

This second installment of the six-part Health Capital Topics series on various statistical methods utilized by valuation analysts will provide a brief overview of descriptive statistics and their utilization in various valuation techniques and methodologies. Descriptive statistics may prove imperative in healthcare valuation analyses, as evidenced by their usage in U.S. ex rel. Drakeford v. Tuomey Healthcare System, Inc., in which Kathleen McNamara, CPA, the government’s expert witness in the case, determined that the productivity of physicians employed by Tuomey fell below the 75th percentile, while their compensation exceeded the 90th percentile.1 Such determination contributed to her opinion that Tuomey paid the physicians at issue in the case in excess of fair market value, and relied heavily on descriptive statistics.2

Before understanding the applications of descriptive statistics in valuation analyses, it is necessary to review some basic information about data (singular datum). Data is a collection of quantitative information or facts from which a dataset, i.e., a specific set of data that generally contains information on multiple related variables, may be derived for statistical analysis.3 Datasets are classified either as populations, i.e., a “well-defined collection of objects” relevant to a particular study, or as samples, a subset of a population.4 Given that collecting information on every object of interest in a population is likely cost-prohibitive, most datasets are composed of samples.5 The fourth installment of this series will more closely examine datasets and the sampling methods used to create this collection of information.

Despite their inherent imperfection in comparison to a population, a sample may nevertheless be very large; such size may necessitate the use of descriptive statistics in order to succinctly describe the information contained in the sample and the dataset. Descriptive statistics present basic information about a particular dataset by using standard calculations to summarize certain characteristics of a dataset,6 including the location of a datum relative to the center of the dataset (median), or the dispersion/spread of the dataset (variance).7 Some common descriptive statistics utilized within valuation reports include measures of central tendency, namely mean and median. The mean of a dataset (also called arithmetic average) is the sum of the values in the dataset divided by the total number of values collected.8 Valuation analysts may utilize the mean of a dataset in numerous situations, such as determining historical averages for internal benchmarks of performance based on historical averages, or for forecasting revenue and expenses when utilizing income-based methods of valuation.9 Another commonly utilized measure of central tendency is the median of a dataset, which denotes the middle value of a dataset when sorted smallest to largest.10 Valuation analysts frequently rely upon the median when analyzing compensation data, as this method of statistical analysis may provide a more accurate measure of central tendency in the presence of outliers. For example, when a dataset has very high or very low values relative to the rest of the data (i.e., outliers) the mean of the dataset can be significantly affected, as it is based on the magnitude of the values collected.11 This phenomenon may lead to an unreliable presentation of the dataset, even if the outliers only represent a small portion of the whole sample.12 However, the median of a dataset is based on the order of the values collected, rather than their magnitude, thereby reducing sensitivity to significant outliers.13

While measures of central tendency are useful in describing a dataset, such methods only explain the center the dataset and do not serve as a complete representation of the collected values. In order to create a more complete picture of the dataset, the valuation analyst should also consider measures of dispersion, or variability within a dataset, in preparing their work product. In valuation analysis, measures of dispersion can inform an analyst how closely observed values are in relation to a measure of central tendency. The dispersion of a dataset can be measured utilizing several methodologies, including variance, standard deviation, and range. The variance of a dataset measures the spread of the values in the set, and the equation used to calculate this measure depends on whether the set represents the whole population or a sample.14 A sample variance is the sum of each value less the mean, squared, then divided by the total number of observations less one.15 A population variance is calculated the same way, except the squared value is divided by the total number of observations.16

The standard deviation of a dataset is the average distance of the datum from the mean, and is a simpler interpretation of dispersion than that provided by the variance, since standard deviation is measured in the same units as the data, whereas variance is measured in square units of the data.17 To calculate both sample and population standard deviation, the square root is taken of the respective variances.18

The final measure of dispersion is range, which is the difference of the largest and smallest values in the dataset.19 Range is the simplest of the three measures of dispersion in both calculation and interpretation; however, range measures total dispersion in the dataset, in contrast to other measures of dispersion, which are measured per datum.20 This characteristic of range may limit the utility of the methodology in the valuation analysis, as ranges are not standardized measures, and therefore not easily comparable. When data contain outliers that impact measures of dispersion, an inter-quartile range is perhaps the superior method for identifying these outliers.21 This method divides the data into quartiles and calculates the difference between the first quartile plus one and the third quartile plus one,22 thereby producing a statistic that indicates dispersion while remaining sensitive to the presence of outliers.23

The next step in analyzing a dataset using descriptive statistics is through hypothesis testing. A statistical hypothesis is a claim or assertion about either the value of a single parameter (population statistic) or the values of several parameters.24 A null hypothesis is a claim that is initially assumed true, and the alternative hypothesis is the opposite claim to the null hypothesis, such that the two hypotheses include all possibilities yet never overlap.25 A researcher selects a significance level, i.e., confidence level, a measure of certainty in the results of the hypothesis test, by which to test the data.26 Hypothesis testing may be used in valuation analyses to assess whether an entity’s performance on a certain metric is significantly different from the industry-indicated benchmark for that metric or whether it is reasonable to assume that the value for the standard error could be the product of the same random process that generated the benchmark dataset.

There are many hypothesis tests that may be used by valuation analysts, including:

1. z-tests;
2. t-tests;
3. f-tests; and,
4. Chi-squared tests.

First, a z-test is a statistical test that uses a random variable that is known to be normally distributed (i.e. has a bell-shaped curve)27 and is used when data has a normal distribution with a known standard deviation. A z-test may also be used when the sample size is sufficiently large (typically greater than 25 or 30 observations),28 thus making the standardized variable Z approximately have a standard normal distribution according to the Central Limit Theorem, which states that if the size of a sample is sufficiently large, then the sample will become approximately normally distributed.29

Next, a t-test is a statistical test that uses a specialized t-distribution, which is designed to test hypotheses when the sample size is small and the distribution is assumed to be normal.30 Both z-tests and t-tests are used to test single hypotheses about parameters or statistical models, but sometimes it is necessary to test a joint hypothesis on more than one parameter at once. This is where the utilization of an f-test becomes necessary. An f-test uses the f-distribution to test more than one parameter or statistical model simultaneously.31 F-tests help to determine if multiple measures considered together provide statistically significant results even if individual results are insignificant. F-tests are also used heavily in regression analysis, which will be discussed in more detail in the fifth installment of this six-part series. A special case of the f-test, called a Chi-squared test, can be used to test normality, i.e., how closely the distribution of a dataset matches a normal distribution.32 Results from this test can be used to support or question the validity of conclusions or claims based on the results of a model that assumes normality.

The last statistical technique used in valuation analyses that will be discussed in this article is Analysis of Variance (ANOVA), which is used to test two distinct data samples to determine if they originate from two different populations.33 In valuation analyses, ANOVA is often utilized to determine if the samples originate from the differences, or variances, between a sample of data with a particular characteristic and a sample of data without the characteristic of interest. There are two types of ANOVA: (1) One-Way ANOVA, which looks at a single factor or single classification; and, (2) Two-Way ANOVA, which looks at multiple factors or multiple classifications that vary among the samples. 34 An ANOVA is conducted by comparing the means of more than two populations or treatments to determine whether the means are identical.35 This technique could be utilized to examine a scenario where a new state law (or multiple laws in different states) has been ratified that affects the value of small physician practices. The ANOVA would be used to determine if the law has any effect on the value of the practice (or if any practices across state borders are at a disadvantage because of the varying degree of said laws).

Having reviewed each of these statistical concepts and techniques, it is necessary to recognize the problems and potential pitfalls that may arise when relying on these techniques. First, utilization of most statistical techniques require a sufficiently large sample size. The Law of Large Numbers, eventually refined with the Central Limit Theorem, states that as the size of a sample increases, the expected value of the sample mean approaches the population mean.36 If sample sizes are not sufficiently large, the results of a statistical analysis built from such data may not accurately indicate information about the dataset, due to the lack of datum to defuse the effects of outliers.37 Another issue when performing statistical analysis is non-representative samples, or sample bias, which occur when the data gathered does not hold the same properties as the population, thereby not accurately representing the population.38 For example, if a sufficiently large sample of physician practices, without technical components, is used as a benchmark for companies with significant technical component revenue streams, unreliable results and interpretations may result, due to the financial and operational differences between the companies; the sample used in the benchmarking would not accurately portray the population, leading to inaccurate conclusions drawn from the data. Random sampling, a process in which every observation has an equal chance of being chosen, can help reduce the probability of sample bias.39

Additionally, valuation analysts should note the potential for errors when drawing conclusions based on the results of hypothesis testing. Specifically, erroneous conclusions drawn from the results of hypothesis testing are classified as either type I or type II errors. A type I error consists of rejecting a null hypothesis when it is true, and a type II error is when a false null hypothesis is not rejected.40 Unfortunately, type I and type II errors are inherent to hypothesis testing. The probability of each type of error is inversely related; thus, attempting to decrease the likelihood of one error will increase the likelihood of making the other error. Therefore, an analyst must decide prior to hypothesis testing what is an acceptable probability for each error type based on the analysis being conducted.41

Valuation analysts should consider the utility of the descriptive statistical methods described above during the course of an engagement. In particular, one or more methods in descriptive statistics may support the defensibility of a particular opinion depending on the needs set forth in the engagement and the data being studied, so long as common pitfalls are avoided and statistics are not misused or misrepresented in the analysis. The third installment of this six-part series will discuss the coefficient of variance, its significance in statistics and valuation analyses, and potential pitfalls or mistakes in its interpretation.

“U.S. ex rel. Drakeford v. Tuomey Healthcare System, Inc.” Case No. 3:05-cv-02858-MBS (U.S. District Court, District of South Carolina, 2009), Fair Market Value, Commercial Reasonable Assessment, p. 19.

Ibid, p. 6-8.

“Probability and Statistics for Engineering and the Sciences” By Jay L. Devore, Australia: Thompson Brooks/Cole, 2004, p. 3.

Ibid.

Ibid.

“Descriptive Statistics” By William M.K. Trochim, Web Center for Social Research Methods, October 20, 2006, http://www.socialresearchmethods.net/kb/statdesc.php (Accessed 7/11/16).

Devore, 2004, p. 30, 36.

Ibid, p. 28.

“Healthcare Valuation: The Financial Appraisal of Enterprises, Assets, and Services” By Robert James Cimasi, MHA, ASA, FRICS, MCBA, AVA, CM&AA, Hoboken, New Jersey: John Wiley & Sons, Inc., 2014, p. 52-54, 56.

Devore, 2004, p. 30.

Ibid.

Ibid.

Ibid, p. 31.

Ibid, p. 37.

Ibid.

Ibid, p. 38.

Ibid, p. 37.

Ibid.

Ibid, p. 36.

Ibid.

“Robust Control Charts” By David M. Rocke, Technometrics, Vol. 31. No. 2 (May, 1989), p 183; Devore, 2004, p. 706-707.

Rocke, May, 1989, p 174.

Ibid, p 183.

Devore, 2004, p. 316.

Ibid.

Ibid, p. 323.

Ibid, p. 326; “Probability and Statistical Inference” By Robert V. Hogg et. al. Upper Saddle River, NJ: Pearson Education Inc., 2015, p. 105.

Hogg et. al., 2015, p. 202.

Ibid, p. 200; Devore, 2004, p. 331.

Devore, 2004, p. 333.

Ibid, p. 399-400.

Ibid, p. 635.

Ibid, p. 410.

Ibid, p. 410, 442.

Ibid, p. 411.

“The Law of Large Numbers” Stat Trek, Statistics and Probability Dictionary, http://stattrek.com/statistics/dictionary.aspx?definition =law_of_large_numbers (Accessed 8/1/2016).

“The Importance and Effect of Sample Size” By Sarah Marley, Select Statistical Services, https://select-statistics.co.uk/blog/importance-effect-sample-size/ (Accessed 7/21/16).

“Bias in Survey Sampling” Stat Trek, 2016, http://stattrek.com/survey-research/survey-bias.aspx (Accessed 7/21/16).

Ibid.

Devore, 2004, p. 319.

Ibid.