AuditNet®
Attribute Sampling Update
by Mike Blakely
Attribute sampling is widely used by auditors in tests of compliance. Generally, audit samples are used to test the adequacy of internal controls. An audit sample results in a sample error rate which can then be used to estimate a range of probable errors in the population being tested. This range is known as a confidence interval because the auditor has some degree of confidence that the overall error rate will be within this interval or range. Based on whether or not the range is acceptable, the auditor may decide to perform other audit tests.
However, two essential aspects of attribute sampling are not well known or understood – 1) most audit software packages do not correctly calculate the confidence interval for attribute samples, and 2) attribute sampling can supplement, or even replace, variable sampling in certain situations, resulting in greater accuracy and increased audit efficiency. This article addresses both areas.
Most of us can recall from our Stat 101 class in college that there are two types of sampling – attribute, which is used most for testing compliance, and 2) variable, which is used for testing dollar balances or other numeric quantities. We’re taught that these methods are for distinct purposes and thus there can be no “cross over”. This results in the belief that variable sampling is never appropriate for tests of compliance and attribute sampling is never appropriate for testing dollar amounts or other numeric quantities.
But is this really true? Think back to how attribute sampling is usually taught. Remember the questions like “if you draw two red cards from an ordinary deck of cards, what is the probability that …”. Or the more classic – “An urn contains 1000 balls, some of which are white and some black. If five balls are randomly drawn…
Well the answer is that attribute sampling can be used for certain types of numeric estimations, especially tests for overstatement. Consider situations such as estimates of invalid / inflated insurance claims, estimates of inventory spoilage, etc. In other words any situation where a dollar amount is based upon an attribute like spoiled, invalid, inflated, etc.
Suppose that the audit objective was to estimate a dollar amount for spoiled inventory, or the dollar amount of the inflated insurance claims. Using the traditional approach, a sample of items would be selected, and each of those items would be examined. The recorded dollar amount of the item examined would then be classified as the “examined amount” and after this had been reviewed, the auditor would arrive at an “audited amount”. The audited amount generally be equal to or less than the examined amount. The difference between the examined amount and the audited amount would be the difference amount or overstatement amount. These difference amounts would then be used to extrapolate the total overstatement amount. This could be done using the traditional approach which involves computing the average, standard deviation etc. These amounts can then be used to develop a confidence interval for the estimated range of values for the entire population, using some specified level of confidence, e.g. 95%.
But let’s look at using an entirely different approach – attribute sampling. We’ll start with an example of auditing insurance claims in order to determine the extent, if any, that the claim amounts are inflated, i.e. the amount of the claim submitted or paid exceeds that which should properly have been paid. There are generally three scenarios that would apply to any individual claim - 1) the claim is entirely valid (i.e. paid correctly), 2) some parts of the claim are valid and some are not, and 3) the entire claim is invalid (nothing should have been paid).
As mentioned before, the traditional sampling unit is the entire claim. But why not divide the claim into the smallest sampling units possible, which are each penny of the claim? For example, if the claim is for $100, then the claim can actually be considered to consist of 10,000 pennies. And let’s break our example claim down further for purposes of illustration. Say it was a medical insurance claim. Perhaps the first item is a doctor visit for $50. The second item is a flu shot for $20 and third item is $30 for some lab work done. The lab work was further broken down into $10 for blood sugar testing and $20 for cholesterol testing. So this insurance claim would be considered to be 5,000 pennies for a doctor visit, 2,000 pennies for a flu shot and 3,000 pennies for lab work (of which 1,000 were for blood sugar and 2,000 for cholesterol). All of this is illustrated in the table below. For purposes of illustration, two random numbers between 1 and 10,000 will be selected and used for sample testing. These random numbers (5,647 and 8,347) will be associated with a claim detail line as shown below.
|
Claim line |
Service description |
Amount |
Starting penny |
Ending penny |
Sampled penny |
|
1 |
Office visit |
$50.00 |
1 |
5,000 |
|
|
2 |
Flu shot |
$20.00 |
5,001 |
7,000 |
5,647 |
|
3A |
Lab work – blood sugar |
$10.00 |
7,001 |
8,000 |
|
|
3B |
Lab work – cholesterol |
$20.00 |
8,001 |
10,000 |
8,347 |
The random numbers are associated with a specific claim item be developing a running sub-total of amounts and computing a starting and ending interval for each item on the claim. These two random numbers would then correlate as follows – the 5,647 would correlate with the flu shot (it is position 647 of the flu shot of 2,000 pennies). The random number 8,347 would relate to the lab work for cholesterol. It can be seen that this number would fall in the range for the cholesterol test (pennies 7,000 – 7999 for blood sugar and pennies 8000 – 9999 for cholesterol).
The way the audit tests would be performed is to review the randomly selected items associated with each sampled penny. In this case the auditor could make a determination if the flu shot was appropriately claimed. If it is, the claim penny associated with random number 5,647 would be considered “justified”, otherwise it would be classified as “unjustified”. The same goes for the penny associated with the cholesterol test.
This same approach could also be applied to an entire population of insurance claims (or any other numeric population for which an attribute is to be tested). The process would be as follows:
1) determine the total dollar amount of the insurance claims to be tested (i.e. the population)
2) multiply the dollar amount by 100 to arrive at the number of pennies
3) generate random numbers between 1 and the total number of pennies
4) select the insurance claim details by maintaining a running subtotal of the paid amount by claim, selecting only those claim amounts whose penny value range includes the random numbers selected
5) Assess each of the pennies selected, i.e. is that line item valid or not
6) Arrive at a total for pennies considered “justified” versus “unjustified”.
Once all the samples have been assessed, then the total counts for “justified” pennies are totaled, along with “unjustified” pennies. These counts are then plugged into a statistical calculator to determine what the confidence intervals are for various levels of confidence.
The traditional calculation of the confidence interval for attribute sampling is to calculate the standard error of the proportion, multiply that amount by the z-score and add and subtract this amount to the proportion in the sample (note 2). As an example, suppose that a sample of 40 is tested and 12 are found to be in error. The sample error rate is then 12/ 40 or 30%. The sample error rate is generally represented as the letter “p” in a formula, and is computed as n/N where n is the sample errors and N is the sample size.
The computation of the standard error of the
proportion (SE) is the square root of (p) * (1-p) / n. This amount is
then multiplied by the z-score for a 95% confidence level which is 1.96.
In this example the standard error is .072. The confidence interval
result is then derived by multiplying .072 by the z-score amount (1.96)
to arrive at .14. This amount is then added to the expected error rate
of .3 to arrive at an upper bound of the confidence interval as .44 and
subtracted from the sample error rate to arrive at a lower bound as .16.
Although this approach is certainly accurate, it is also quite conservative. In some cases the lower bound can be less than zero or the upper bound can be greater than 100% (and both of these conditions are obviously impossible). Here are two examples from the Excel worksheet available with this article.

In 1987, Buonaccorsi (note 1) developed a more accurate method of estimating the confidence interval, based upon hypergeometric distributions. The only audit software packages known to this author which calculate the attribute confidence interval using this method are the free statistical sampling package from the feds (note 3) and the “R” software package (also free) (Note 4).
Classical sampling issues
As an example illustration consider a case such as the sample included in the attached Excel workbook. This sample was drawn from a population of 10,000 claims which had a submitted value of $451,288. Almost all (39 of 42) of the sampled claims were inflated. The audit objective was to determine how much had been overpaid in Medicare in order to recover the over-billed amounts.
Using traditional approaches based on the Central Limit Theorem, the range of overpayment amounts at a 90% confidence level would be from $289,321 to $456,341. In this case the provider would be asked to return $289,321 of misspent taxpayer dollars. However, had attribute sampling been used then the range would be the more accurate $372,590 to $442,374. Of this, the amount to be returned would be $372,590 which is $83,269 higher than the estimate using the traditional approach (taxpayers recover more misspent funds).
The cases above are extreme, certainly. But the reason for the errors is that classical sampling and extrapolation results based on the central limit theorem (CLT) are approximate and not robust, that is they will not produce reliable results in every instance. They cannot guarantee the prescribed 90% confidence level / under-recoupment rate. CLT-based extrapolations fail in this respect when an attribute rate is either very high or very low.
This conclusion has been explained in articles submitted for publication by Don Edwards, Chairman of the Statistics Department at the University of South Carolina (note 5). It is also borne out by running simulation tests, more fully described below. He has pointed out that these methods are especially useful in testing claims for Medicare and Medicaid where there have been clear cases of fraud, waste and abuse. The federal agency overseeing these programs is the Center for Medicare and Medicaid Services (CMS).
Advantages of Penny Sampling
There are two primary advantages – 1) increased accuracy in some types of cases (high provider billing error rates) and 2) greater sampling efficiency. This translates into recoupment estimates which are more defensible in court and require less effort (money) in sample testing because smaller sample sizes can be used.
When provider billing error rates are high, penny sampling is more likely than traditional methods to produce a recoupment estimate which meets the spirit of the CMS guidelines for conservatism and not over recouping.
Simulation testing performed
In order to compare the classical and penny sampling approaches, a simulated claims population was developed which consisted of 5,000 claim payments. For each claim payment, there was both a stipulated paid amount and audited amount. Thus, unlike an actual investigation situation, it was known in advance what the total amount of recoupment would be if 100% of the population were investigated. The purpose of the test was to determine how the two sampling methods compared as to their accuracy and how often they computed a recoupment amount in excess of the known total recoupment amount.
The simulation tests were done using Excel software on a desk top PC using the following procedures:
Generate a hypothetical population of paid claims having a predetermined error rate. For example if the predetermined error rate were 80%, then approximately 80% of the paid claim amounts would have an audited amount less than the paid amount. For purposes of this simulation, the paid claims at an 80% error rate could be classified as follows:
1) 48% of claims completely denied, i.e. nothing should have been paid
2) 32% of claims partially denied, i.e. something, but not all should have been paid
3) 20% of claims completely allowable, i.e. the paid amount was correct.
The number of samples taken were usually 100. A 90% confidence level was specified for the tests (the 90% amount is the amount is from the CMS Program Integrity Transmittal 114, Change Request 3734). Using this information, a number of random samples were drawn and the lower bound calculation performed using both methods (classic CLT and penny sampling). Each instance which resulted in the extrapolated amount exceeding the actual total recoupment amount that should have been sought was noted. In addition, the average recoupment amount using each method was also calculated.
Summary findings
The instances where the extrapolation method resulted in estimates exceeding the total actual recoupment amount were consistently lower using penny sampling than for the classical sampling method.
Disadvantages
The primary disadvantage of penny sampling is that apparently no other government agency is using this method. If they are, they are not openly publicizing it. The other disadvantage is that although penny sampling works well in situations having high error rates, it may fare less well in situations with normal or low error rates.
Sampling Efficiency
There is an accepted audit practice called discovery sampling. The purpose of such samples is to obtain some preliminary information to use as a basis for later actions, if any. For example, if a discovery sample is taken and the results indicate a potential control issue, the audit team may decide to perform further investigation.
Penny Sampling lends itself quite well to discovery sampling in situations where a high error rate exists. The reason is that generally, even small samples (e.g. 20 – 30) will yield fairly accurate estimates in situations having high error rates. A further advantage of Penny Sampling is that the sequential sampling technique (“stop or go”) can be used providing increased flexibility and efficiency. Thus, there is no need to take a large initial sample, which saves time and money.
Footnotes
1) The procedure used to derive this confidence interval can be found in John P. Buonaccorsi (1987), “A Note on Confidence Intervals for Proportions in Finite Populations,” The American Statistician, Vol. 41, No. 3, pp. 215-218.
2) Training presentation at the State of Washington - http://www.seattle.gov/audit/training_files/audit_sampling.ppt#270,14,Confidence Interval for Proportion
3) RAT-STATS sampling software. http://oig.hhs.gov/compliance/rat-stats/index.asp
4) R – software - http://www.r-project.org/
5) Professor Don Edwards home page - http://www.stat.sc.edu/~edwards/
The opinions, beliefs and viewpoints expressed by
the various authors and forum participants on this web site do not
necessarily reflect the opinions, beliefs and viewpoints of AuditNet®


