Combine Targeted and Random Samples to Achieve High Confidence that a High Percentage of the Decision Area is Acceptable

The objective of this design is to demonstrate, with high probability, that a high percentage of the decision area (or population) is acceptable, given that none of the samples are unacceptable. To achieve this objective, it is often desirable to combine the information obtained from judgmental samples with the information acquired from samples that are taken from randomly selected items or locations. The following discussion is presented in terms of sampling conducted within a decision area (such as room or collection of rooms within a building). However, this methodology is equally applicable to the sampling of any finite population of items, in which case the decision area is analogous to the population of items and the grid cell sampling locations are analogous to the individual items that will be sampled.

Combined Judgmental and Random (CJR) sampling (Sego et al, 2007, 2010) is a statistical method that can be used to determine the number of randomly located samples that should be taken in addition to a predetermined number of judgmental samples to establish with C x 100% confidence that at least l x 100% of the decision area is acceptable,provided that none of the samples are found to be unacceptable. The statistical model of the CJR methodology is based on binary outcomes, such as 1) the presence or absence of a particular quality, 2) a sample result being acceptable or unacceptable as defined by an action level threshold, 3) contamination being detected or not detected, etc.

Sometimes judgmental samples are referred to as "targeted" samples. We use the two terms interchangeably. We define a judgmental sample to be any sample whose location is determined by informed prior belief and expert knowledge and not in a random fashion. Randomly located samples are sometimes called "probabilistic" samples. This terminology implies that probabilistic statements can only be made based upon the results of randomly placed samples. However, the CJR method utilizes a Bayesian model that gives rise to a formal probabilistic statement regarding the acceptability of the decision area based upon the information obtained from both judgmental and random samples.

The CJR method requires that all surfaces in the decision area be divided into non-overlapping, equal-size grid cells of specified size that correspond to the sampling methodology, e.g., 10cm x 10cm. While the CJR method was initially designed for use inside buildings, it may be used outdoors if the decision area can be divided into grid cells. In the case of item sampling, a grid cell would be analogous to a specific item.

The size of the grid cell should correspond to the footprint of the sampling methodology (i.e. the area sampled by the swab, wipe or vacuum). If more than one sampling methodology is to be employed in a decision area, the size of the grid cell should be chosen to match the sampling methodology with the smallest footprint. The location of samples that will be taken using methodologies with larger footprints should be assigned in a consistent fashion, e.g. the sample is centered on the smaller grid cell that was assigned by VSP, or the upper-left corner of the larger sample is aligned with the upper-left corner of the grid cell assigned by VSP, etc. While this approach to multiple sampling methodologies is conservative, it ensures that the desired confidence level is preserved.

The CJR design is especially suited for use in decision areas where unacceptable outcomes are deemed unlikely a priori. If at any time during the sampling process, one of the samples is found to be unacceptable, the decision area is declared to be unacceptable and no further samples for the CJR design need be taken. If this occurs, it may be desirable to implement a hot spot or geospatial sampling plan to characterize the extent of the unacceptable locations or items.

We presume that judgmental samples are taken in areas that are more likely to be unacceptable than areas available for random sampling. Consequently, if none of the judgmental sample results are unacceptable, that information can be used to reduce the number of required random samples required to achieve the desired confidence level.

Summary of notation used in the CJR model :

Symbol

Description

Range

\(\theta\)

Probability that a high risk cell (which will be sampled judgmentally) is unacceptable

\(0 \lt \theta \lt 1\)

\(\beta\)

Shape parameter of the Beta distribution

\(\gt 0\)

\(p(\theta) = \beta(1-\theta)^{\beta-1}\)

Beta probability density function of \(\theta\)

\(\ge 0\)

\(P_J = 1-E(\theta) = \frac{\beta}{1+\beta}\)

The expected prior probability that a single judgmental sample location will be acceptable - equivalently, the expected fraction of judgmental sample locations that will be acceptable

\( 0 \lt P_J \lt 1\)

\(r\)

A high risk judgmental sample location is, on average, \(r\) times more likely to be unacceptable than a low risk random sample location

\(\ge 1\)

\(n_1\)

Number of judgmental samples, which is equivalent to the number of high risk cells

non-negative integer

\(n_2\)

The required number of random samples

non-negative integer

\(N\)

Total number of grid cells in the decision area

positive integer

\(X\)

Number of judgmental samples (from high risk grid cells) that are unacceptable

\(0,1,...,n_1\)

\(Y\)

Number of random samples (from low risk grid cells) that are unacceptable

\(0,1,...,n_2\)

\(Z\)

Number of unsampled, low risk grid cells that are unacceptable

\(0,1,...,N-n_1-n_2\)

\(\lambda_p\)

Expected proportion of the decision area that is acceptable prior to sampling

\(0 \lt \lambda_p \le 1\)

\(\lambda\)

Desired proportion of decision area that is acceptable (after sampling)

\(0 \lt \lambda \le 1\)

\(\lambda_v\)

The viable \(\lambda\), i.e., the smallest value of \(\lambda\) that will ensure that larger decision areas will require more samples than smaller decision areas

\(0 \lt \lambda_v \le 1\)

\(C\)

Desired probability (confidence) that \(\lambda\) x 100% of the decision area is acceptable

\(0 \lt C \lt 1\)

Assumptions

1. The total number of grid cells, \(N\), in the decision area is known and each grid cell is the same size.

2. The size of the grid cell is appropriate for the chosen sampling methodology. If more than one sampling methodology is employed in a decision area, the size of the grid cell is chosen to match the sampling methodology with the smallest footprint.

3. The outcome from each sample will be binary, such as the presence/absence of contamination (as determined by the loss of detection) or acceptable/unacceptable levels of contamination (as determined by the action level).

4. The measurement (inspection) method correctly classifies each sample as being acceptable or unacceptable, i.e., an acceptable grid cell is not classified as being unacceptable (a false positive), and an unacceptable grid cell is not classified as being acceptable (a false negative).

5. All grid cells are independent. If spatial correlation is present, the CJR method is conservative, i.e., more samples are required than would otherwise need be.

6. In the decision area, there are \(n_1\) high risk grid cells which are more likely to be unacceptable than the remaining \(N-n_1\) grid cells which are low risk cells.

7. All \(n_1\) high risk grid cells are sampled with judgmental samples.

8. A high risk cell is, on average, \(r\) times more likely to be unacceptable than a low risk cell.

9. Acceptable outcomes from judgmental samples increases our confidence that low risk cells are also acceptable.

10. Before sampling takes place, we expect the probability of a judgmental sample being acceptable to be \(P_J\).

11. The probability that a high risk cell is unacceptable, \(\theta\), has a Beta prior distribution with the first shape parameter equal to 1 and the second shape parameter equal to \(\beta = P_J/(1-P_J)\).

12. A random sample of size \(n_2\) is taken from the low risk grid cells. The sample locations may be selected using simple random, systematic random, or adaptive fill sampling.

Method used in VSP to compute \(n_2\), the number of random samples

We provide below the principal formula used to calculate \(n_2\), the required number of random samples. Additional details of the methodology are discussed by Sego et al. (2010). The Bayesian posterior confidence, \(C\), that at least \(\lambda\) x100% of the decision area is acceptable, given that all judgmental and random samples were acceptable is given by

\begin{equation} C = P(Z \le (1-\lambda) N|X = 0, Y = 0) = 1-\frac{\Gamma(N-n_1-n_2+1)\Gamma({\lambda}N+n_1(r-1)+r(\beta+1)-1)}{\Gamma({\lambda}N-n_1-n_2)\lambda(N+n_1(r-1)+r(\beta+1))} \end{equation}

where the gamma function is given by \(\Gamma(z) = \displaystyle\int_{0}^{\infty} t^{z-1}e^{-t} \,dt\), and all other symbols are defined previously. Using standard numerical root finding algorithms, VSP solves equation (1) for the number of random samples, \(n_2\), which is rounded up to the nearest whole number of samples.

Recommendations for parameter selection

To initialize the Bayesian CJR model, there are two parameters that must be specified by the user before sampling begins. The first is \(P_J\), the expected probability that a high risk judgmental sample location is acceptable. The second is \(r\), the factor that indicates the ratio of the expected probability of an unacceptable judgmental sample to the expected probability of an unacceptable random sample.

We recommend using a value of \(P_J\) between 0.50 and 0.999. If the user prefers to make a neutral (noninformative) assertion regarding the prior belief that a judgmental sample will be acceptable, setting \(P_J\) = .50 results in a uniform prior distribution for \(\theta\). However, the uniform prior is very conservative (and even pessimistic) because it results in a prior belief where "the chance of a high risk cell being acceptable is 1%" is just as likely as "the chance of a high risk cell being acceptable is 99%".

We recommend that \(r\) be chosen conservatively, that is, that investigators err on the side of underestimating \(r\) . However, overly conservative estimates can result in large numbers of random samples required to obtain the desired confidence level. In general, we recommend choosing values of \(r\) between 1.5 and 5. Choosing a value of \(r\) = 1 makes the contribution of the judgmental samples equivalent to the random samples.

While trying to construct a feasible sampling plan, investigators should examine the impact of slightly reducing the desired acceptable percentage of the sampling area (\(\lambda\)x100%). This is because, in most instances, \(\lambda\) is the most influential parameter in determining the required sample size. We also recommend, in general, that \(\lambda\) be greater than \(C\), because it results in much stronger confidence statements. For example, being "95% confident that 99% of the decision area is acceptable" is a much stronger statement than being "99% confident that 95% of the decision area is acceptable".”

Before any samples are collected, the values of \(N\), \(P_J\), \(n_1\), and \(r\), induce a level of confidence that a fraction of the decision area is acceptable. Consequently, we can calculate the expected percentage of the acceptable grid cells, before any sampling takes place. This is given by

\begin{equation} \lambda_p = E\Big(1-\frac{X+Y+Z}{N}\Big) = 1-\frac{N+n_1(r-1)}{Nr(\beta+1)} \end{equation}

When \(\lambda_p\) is too large relative to \(\lambda\), the prior belief in the acceptability of the decision area may already satisfy the intent to demonstrate (through sampling) that \(\lambda\) x 100% of the decision area is acceptable. This can result in the confidence function (1) not being a decreasing function of \(N\), which, in turn, can potentially result in fewer random samples being required in larger decision areas than in smaller decision areas. This phenomenon results from a characteristic of the Binomial cumulative distribution function that is used in the model, and most often occurs when \(P_J\) is close to 1 and/or \(r\) is large. To avoid this phenomenon, VSP identifies and recommends the viable value of the fraction of acceptable grid cells, \(\lambda_v\), that is just large enough to ensure that (1) is a non-increasing function of \(N\), given the other model inputs. The value of \(\lambda_v\) is calculated as follows:

Let \(k = (r(n_1+\beta+1)-2n_1)/2\). If \(k \le 1\), then any value of \(\lambda\) will be viable. However, for \(k \gt 1\), \(\lambda_v = 1-1/k\). Investigators can choose to use the recommended \(\lambda_v\), or they may prefer to reduce the values of \(P_J\) and/or \(r\), which will decrease the value of \(\lambda_v\). Smaller values of \(\lambda\) will result in smaller required sample sizes, albeit at the expense of weaker confidence.

Matching CJR symbols to parameter inputs in the VSP module

The following figure illustrates how the symbols of the CJR model correspond to the inputs in the VSP Presence/Absence sampling module:

image\CJR.gif

References:

Gelman A, Carlin JB, Stern HS, Rubin DB. 2004. Bayesian Data Analysis, Second Edition. Chapman & Hall/CRC, pp. 8 and 40.

Sego LH, KK Anderson, BD Matzke, WK Sieber, S Shulman, J Bennett, M Gillen, JE Wilson, and BA Pulsipher. 2007. An environmental sampling model for combining judgmental and randomly placed samples. PNNL-16636, Pacific Northwest National Laboratory, Richland, WA. http://www.pnl.gov/main/publications/external/technical_reports/PNNL-16636.pdf

Sego LH, S Shulman, KK Anderson, JE Wilson, BA Pulsipher, WK Sieber. 2010. Acceptance sampling using judgmental and randomly selected samples. PNNL-19315, Pacific Northwest National Laboratory, Richland, WA. http://www.pnl.gov/main/publications/external/technical_reports/PNNL-19315.pdf

The Calculate Required Number of Random Samples page contains the following controls:

Total Number of Grids Cells in Decision Area

Number of judgmental samples

Pick Judgment Grid Cells / Stop Picking Judgment Grid Cells

A-Priori Probability of Judgmental Sample Being Acceptable Drop-List

A-Priori Probability of Judgmental Sample Being Acceptable

Times more likely

Confidence

Percentage of Decision Area that is Acceptable