Employ Prior Belief to Achieve High Confidence that at most a Fixed Number of Grid Cells in the Decision Area are Unacceptable

The objective of this design is to account for prior belief in the acceptability of grid cells in the decision area in order to demonstrate with high probability, that a high percentage of the decision area (or population) is acceptable, given that none of the samples are unacceptable. This design assumes that the decision area can be stratified according to the risk of samples being unacceptable. It also assumes that all samples are observed to be acceptable. The following discussion is presented in terms of sampling conducted within a decision area (such as room or collection of rooms within a building). However, this methodology is equally applicable to the sampling of any finite population of items, in which case the decision area is analogous to the population of items and the grid cell sampling locations are analogous to the individual items that will be sampled.

Stratified Compliance Sampling (SCS) is a generalization of the Combined Judgmental and Random (CJR) sampling design (Sego et al. 2007, 2010). It exchanges the concept of judgmental and random grid cells for a stratified structure where there is at least one stratum that can be identified as most likely to contain an unacceptable grid cell. All other strata are less likely to contain unacceptable grid cells. With this structure, taking a sample from the highest-risk stratum counts for more in terms of accumulating confidence than sampling from any other strata. The model assumes that data are observed as binary outcomes, such as 1) the presence or absence of a particular quality, 2) a sample result being acceptable or unacceptable as defined by an action level threshold, 3) contamination being detected or not detected, etc.

The SCS method requires that all surfaces in the decision area be divided into non-overlapping, equal-size grid cells of specified size that correspond to the sampling methodology, e.g., 10cm x 10cm. While the SCS method was initially designed for use inside buildings, it may be used outdoors if the decision area can be divided into grid cells. In the case of item sampling, a grid cell would be analogous to a specific item.

The size of the grid cell should correspond to the footprint of the sampling methodology (i.e. the area sampled by the swab, wipe or vacuum). If more than one sampling methodology is to be employed in a decision area, the size of the grid cell should be chosen to match the sampling methodology with the smallest footprint. The location of samples that will be taken using methodologies with larger footprints should be assigned in a consistent fashion, e.g. the sample is centered on the smaller grid cell that was assigned by VSP, or the upper-left corner of the larger sample is aligned with the upper-left corner of the grid cell assigned by VSP, etc. While this approach to multiple sampling methodologies is conservative, it ensures that the desired confidence level is preserved.

Summary of notation used in the SCS model :

Symbol	Description	Range
\(N_i\)	The number of grid cells in the ith stratum	Positive integer
\(N\)	Total number of grid cells in the decision area	\(N=\sum N_i\)
\(\rho_i\)	Likelihood parameters that grid cells in other strata are unacceptable relative to grid cells in the highest-risk stratum. The highest-risk stratum is assumed to have \(\rho_i=1\)	\( 0 \lt \rho_i \le 1\)
\(P_J\)	A priori expected proportion of grid cells that are unacceptable in the highest-risk stratum	\( 0 \lt P_J \lt 1\)
\(\lambda\)	Minimum desired proportion of decision area that is acceptable	\( 0 \lt \lambda \lt 1\)
\(t\)	Maximum number of grid cells in the decision area that may be unacceptable	\( t=(1-\lambda)N\)
\(C\)	Desired probability (confidence) that at most \( t \) un-sampled grid cells are unacceptable	\( 0 \lt C \lt 1\)
\(n_i\)	Number of random samples in stratum \(i\) required to achieve the confidence criteria	Non-negative integer
\(Y_i\)	Number of unsampled, unacceptable grid cells in stratum \(i\). This is used to calculate the confidence	Non-negative integer
\(k\)	Number of strata	Positive integer

Assumptions

1. The total number of grid cells, \(N\), in the decision area is known and each grid cell is the same size.

2. The size of the grid cell is appropriate for the chosen sampling methodology. If more than one sampling methodology is employed in a decision area, the size of the grid cell is chosen to match the sampling methodology with the smallest footprint.

3. The outcome from each sample will be binary, such as the presence/absence of contamination (as determined by the loss of detection) or acceptable/unacceptable levels of contamination (as determined by the action level).

4. The measurement (inspection) method correctly classifies each sample as being acceptable or unacceptable, i.e., an acceptable grid cell is not classified as being unacceptable (a false positive), and an unacceptable grid cell is not classified as being acceptable (a false negative).

5. All grid cells are independent. If spatial correlation is present, the SCS method is conservative, i.e., more samples are required than would otherwise need be.

6. Before sampling takes place, we expect the proportion of unacceptable grid cells in the highest-risk stratum to be \(P_J\).

7. The probability that a high risk cell is unacceptable, \(\theta\), has a Beta prior distribution with the first shape parameter equal to 1 and the second shape parameter equal to \(\beta = (1-P_J)/P_J\).

8. Random samples of size \(n_i\) are taken from each stratum \(i\). The sample locations may be selected using simple random, systematic random, or adaptive fill sampling.

Method used in VSP to compute the number of random samples

We provide below the principal formula used to calculate \(n_i\), the required number of samples for each strata. Additional details of the methodology are discussed by Venzin et al. (2014). The confidence, \(C\), that at least \((1-\lambda)\) x 100% of the decision area is acceptable, given that all samples are acceptable, is given by

\begin{equation} C(\xi) =\frac{ \int_{0}^{1}P(\sum_{i=1}^{k}Y_i \le t|\theta)\prod_{i=1}^{k}(1-\rho_i \theta)^{n_i(\xi)}(1-\theta)^{\beta-1}d\theta}{\int_{0}^{1}\prod_{i=1}^{k}(1-\rho_i \theta)^{n_i(\xi)}(1-\theta)^{\beta-1}d\theta} \end{equation}

We do not explicitly write out the probability of the sum of \(Y_i\) as it is a convolution and not easily expressed in closed form. The integrals in (1) are numerically approximated using an adaptive version of Simpson's rule. Using standard numerical root finding algorithms, VSP solves equation (1) by finding the value of \(\xi\) that satisfies the desired level of confidence. The value of \(\xi\) provides the required number of samples \(n_i\), as follows:

\begin{equation} n_i(\xi) = \xi \left( \alpha \frac{N_i}{N} +(1-\alpha)\rho_i \right) \end{equation}

Equation (2) governs how samples are allocated among the strata. The parameter \(\alpha\) governs the tradeoff between allocating samples based on the relative risk, \(\rho_i\), and the relative size of the strata, \(N_i/N\). When \(\alpha = 0\), samples are allocated by relative risk, i.e., more samples are assigned to higher risk strata than lower risk strata. When \(\alpha = 1\), samples are allocated by stratum size, i.e., samples are allocated proportionally among strata, with larger strata receiving more samples. Values of \(0 < \alpha < 1\) give a mixture of the two sampling strategies. The user can choose to have VSP allocate samples based only on relative risk, \( \alpha = 0 \), based on strata size, \( \alpha = 1 \), or have VSP examine a sequence of \( \alpha \) values and choose the \(\alpha\) that minimizes the total sample size.

Fixing the Number of Unacceptable Items

While creating sampling plans, it is common practice to examine various plans that are generated using different input parameters. With the SCS model, these comparisons must be done with care. Suppose we have two SCS designs, A and B, which have similar parameters except design A has \( N = 10,000 \) while design B has \( N = 30,000 \). If a value of \(\lambda = 0.99 \) were chosen, it is possible that fewer samples would be required for design B than design A, especially if \(P_J\) is very small. To avoid conundrums like these, the same value of \(t\) (not \(\lambda\)) would need to be used for both designs. While it is natural to want to characterize an SCS design in terms of \(\lambda\), the desired proportion of the area that is acceptable, one must always keep in mind that, in the mathematics of the SCS model, \(\lambda\) is always converted to \(t\), the maximum (and discrete) number of grid cells that are allowed to be unacceptable. In short, \(t\), is the driving parameter of the confidence and the required sample size.

Effect of Sample Allocation on Sample Size

It is instructive to consider how the choice of \(\alpha \) affects the sampling plan in the stratified model. Recall that \(\alpha = 0\) implies a sampling plan based entirely on relative risk and \(\alpha = 1\) implies a sampling plan based entirely on the size of each stratum. Consider the following figures:

Figure 1 - In the top two panels, \(\rho\)= (1,0.50,0.25), while \(\rho\)= (1,0.85,0.80) in the bottom panels. In both sets of panels, \(N\)= (3300,3300,3400), prior probability \(P_J\)=0.99, target confidence \(C\)=0.95,and max number of unacceptable items \(t\)=100. Note that the black curve is sample size from stratum 1, the red curve is sample size from stratum 2, and the blue curve is sample size from stratum 3.

Figure 1 illustrates the effect of \(\alpha\) in allocating samples among the strata. Note that samples sizes among the strata are virtually equivalent when \(\alpha = 1\), which we would expect because the strata sizes are almost identical. In both cases (top and bottom), sampling entirely by size of stratum leads to more samples. Sample size can be minimized by choosing \(\alpha=0\), but note that doing so requires the assumption that the relative risk parameters, \(\rho_i\), have been chosen correctly.

Matching SCS symbols to parameter inputs in the VSP module

The following figure illustrates how the symbols of the SCS model correspond to the inputs in the VSP stratified sampling module:

Kiss FFT Licensing Information

This design uses the Kiss FFT library in order to improve performance by using Fast Fourier Transform (FFT) convolution to calculate the distribution function of \(\sum_{i=1}^k Y_i \). Please see the complete Kiss FFT copyright notice and disclaimer for more information.

References:

Gelman A, Carlin JB, Stern HS, Rubin DB. 2004. Bayesian Data Analysis, Second Edition. Chapman & Hall/CRC, pp. 8 and 40.

Kiss FFT. Version 1.3.0. http://sourceforge.net/projects/kissfft/

Sego LH, KK Anderson, BD Matzke, WK Sieber, S Shulman, J Bennett, M Gillen, JE Wilson, and BA Pulsipher. 2007. An environmental sampling model for combining judgmental and randomly placed samples. PNNL-16636, Pacific Northwest National Laboratory, Richland, WA. http://www.pnl.gov/main/publications/external/technical_reports/PNNL-16636.pdf

Sego LH, S Shulman, KK Anderson, JE Wilson, BA Pulsipher, WK Sieber. 2010. Acceptance sampling using judgmental and randomly selected samples. PNNL-19315, Pacific Northwest National Laboratory, Richland, WA. http://www.pnl.gov/main/publications/external/technical_reports/PNNL-19315.pdf

Venzin AM, LH Sego. 2014. A Stratified Approach for Clearance Sampling With Prior Information To Incorporate Variable Risk.