### Data Collection Protocols

"*The unity of science consists alone in its method, not in its material*.''

Karl Pearson, 1892

"Statistics is the branch of scientific method which deals with the data obtained by counting or measuring the properties of populations of natural phenomena. In this definition `natural phenomena' includes all the happenings of the external world, whether human or not.''

M.G. Kendall, 1943

This page seeks to illustrate what we consider to be the common structure of statistics, what we propose to call statistical method.

**Statistical Method**

*Statistical method* can be usefully represented as a series of five stages - Problem, Plan, Data, Analysis, Conclusion. We use the acronym PPDAC to refer to this series. Each stage of statistical method comes with its own issues to be understood and addressed (summarized in the table below).

**Figure:** The statistical method.

One stage leads to the next and is dependent on previous stages. Looking back, this means that each stage is carried out and legitimized (or not) in the context of the stages which precede it (e.g. there is little value in a Plan that does not address the Problem; in such a case, one of the two stages must be modified). Looking ahead at any stage, choices can be made that will simplify actions taken in a later stage (e.g. a well designed Plan can simplify the Analysis).

A structure for statistical method is useful in two ways: first to provide a template for actively using empirical investigation and second, to critically review completed studies. The structure of all empirical studies, either implicitly or explicitly, can be represented by the five stage model.

In this section, we expand on the key concepts and tasks of each stage introducing new terminology as needed. As pointed out in the first section, in many ways this investigation is not typical of a statistical one and we urge the readers to test the proposed structure and language on other applications.

**The Problems**

Understanding what is to be learned from an investigation is very important.

Excellent advice is to get a clear understanding of the physical background to the situation under study, to clarify the objectives and to formulate the problem in statistical terms.

The purpose of the problem stage in statistical method is to provide a clear statement of what is to be learned. A well defined structure and clear terminology will help translate the contextual problem into a form that can guide the design and implementation of the subsequent stages.

**Units and Target Population**

The *target population* is the collective of units about which we would like to draw conclusions. Care needs to be taken in specifying both.

The Environmental Heroes is keen to determine the abundance and diversity of amphibians and reptiles in the Canal Way Park at each of the defined wetland sites and the factors affecting. A unit, then, is one observation of such subjects. The target population is all such subjects, before, during and after the date of the observation.

For some investigations it may be easier to define the units or the collective in terms of a process which generates them. An example is a manufacturing process producing units under specified conditions. In such cases it might be more convenient to refer to the* target process* rather than the* target population*.

**Variates**

*Variates* are characteristics of each unit in the population and can take numerical or categorical values. The values of variates typically differ from unit to unit.

The primary variate of interest, which we call the response variate, is number of subjects observed at each of the sites. There are many other variates, which we call explanatory variates attached to each unit such as the size, weight, environmental factors such as weather and water quality measurements.

**Population Attributes**

Population* attributes* are summaries describing characteristics of the population. Formally an attribute is a function applied to the entire population and determined through the variate values on individual units.
The attribute of interest is the average number of all units in the target population.

Attributes can be numerical or graphical. For example, a scatterplot constructed using all units in the target population is an attribute. The coefficients of the least squares line fitted to this scatterplot and the residual variation around the line are numerical attributes.

A clear specification of the attributes of interest can resolve many issues.

**Problem Aspect**

The aspect defines the basic nature of the problem and is *causative*, *predictive* or *descriptive*.

A problem with a causative aspect corresponds to one where interest lies in investigating the nature of a causative relationship between an explanatory variate and a response variate. The preceding language allows us to be more precise about what is meant by `causative relationship'. By this we mean that a change in the value of the explanatory variate (while holding all other explanatory variates fixed) for all units in the population results in a change in the value of an attribute of interest.
A problem has a predictive aspect if the object is to predict the values of variates on one or more units in the target population. A problem has a descriptive aspect if the object is to estimate or describe one or more attributes of the population.
The problem aspect here is descriptive; the aim is to estimate a population attribute, the average number of amphibians and reptiles per site. Eventually we will attempt to show that the abundance and diversity can be changed by, for example, poor water quality, then the problem has a causative aspect. In time we may be able to predict the abundance and diversity of subjects.
It is important to decide the aspect at the problem stage because of the special requirements it can impose on the plan.

**The Plan**

The purpose of this stage is develop a plan for the collection and analysis of the data. We propose to break the planning into several sub-stages, some of which inevitably overlap. In an active use of PPDAC, some iteration may be required within the stage and between stages before a satisfactory plan is developed.

- Specifying the study units and study population
- Selection of the response variates to be measured
- Dealing with explanatory variates
- The measuring processes
- The sampling protocol
- The data collection protocol

Note: Material on this page was heavily borrowed from: http://www.stats.uwaterloo.ca/~rwoldfor/papers/sci-method/paperrev/node36.html