Surrogate data

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Surrogate data, sometimes known as analogous data,[1] usually refers to time series data that is produced using well-defined (linear) models like ARMA processes that reproduce various statistical properties like the autocorrelation structure of a measured data set.[2] The resulting surrogate data can then for example be used for testing for non-linear structure in the empirical data; this is called surrogate data testing.

Surrogate or analogous data also refers to data used to supplement available data from which a mathematical model is built. Under this definition, it may be generated (i.e., synthetic data) or transformed from another source.[1]

Surrogate data is used in environmental and laboratory settings, when study data from one source is used in estimation of characteristics of another source.[3] For example, it has been used to model population trends in animal species.[4] It can also be used to model biodiversity, as it would be difficult to gather actual data on all species in a given area.[5]

Surrogate data may be used in forecasting. Data from similar series may be pooled to improve forecast accuracy.[6] Use of surrogate data may enable a model to account for patterns not seen in historical data.[7]

Another use of surrogate data is to test models for non-linearity. The term surrogate data testing refers to algorithms used to analyze models in this way.[8] These tests typically involve generating data, whereas surrogate data in general can be produced or gathered in many ways.[1]

Methods

[edit | edit source]

One method of surrogate data is to find a source with similar conditions or parameters, and use those data in modeling.[4] Another method is to focus on patterns of the underlying system, and to search for a similar pattern in related data sources (for example, patterns in other related species or environmental areas).[5]

Rather than using existing data from a separate source, surrogate data may be generated through statistical processes,[2] which may involve random data generation[1] using constraints of the model or system.[8]

See also

[edit | edit source]

References

[edit | edit source]
  1. ^ a b c d Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  2. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  3. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  4. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  5. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  6. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  7. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  8. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

Further reading

[edit | edit source]
  • Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).