Simulation

Peter Alping

May 24, 2018

  • Introduction
  • Basics
  • Linear Regression
  • Logistic Regression
  • Confounding
  • Selection Bias
  • Generalized Linear Models

Introduction

Simulation is the imitation of the operation of a real-world process or system – Wikipedia

Our typical situation

We want to know more about a biological process for which we have measured data

  • An biological process generates data
  • We measure and collect this data
  • Invent model how we think data was generated
  • Fit the model to our data
  • Giving us the parameters for the model
  • Inferences about the data-generating process
Typical situation

The simulation situation

We want to know more about how something behaves, given data of specific type

  • Specify a model for the simulated data
  • Including model parameters
  • Generate data using this model
  • Test something given our data
  • Compare the result to our “true” model
Simulation situation
Typical situation
Simulation situation

Basics

Linear Regression

When do we use it?

  • Continuous dependent variable (y)
  • Cont./discrete independent variables (x_1, ..., x_p)
  • Error normally distributed
  • y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_px_p + \epsilon
  • \epsilon \sim normal(0, \sigma)

In GLM notation

  • Dependent variable from normal distribution (y)
  • Cont./discrete independent variables (x_1, ..., x_p)
  • E(y) = \mu = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_px_p
  • y \sim normal(\mu, \sigma)

Logistic Regression

When do we use it?

  • Binary dependent variable (y)
  • Cont./discrete independent variables (x_1, ..., x_p)
  • No common error distribution independent of predictor values
  • logit(y) = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_px_p + \epsilon
  • logit(y) = log(\frac{y}{1 - y})
  • \epsilon has no independent distribution

In GLM notation

  • Dep. var. from binomial distr. with 1 trial (y)
  • Cont./discrete independent variables (x_1, ..., x_p)
  • E(y) = \mu = logit^{-1}(\beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_px_p)
  • logit^{-1}(x) = \frac{e^x}{1 + e^x}
  • y \sim binomial(1, \mu)

Confounding

Confounding

Collider Stratification Bias

Collider stratification bias

Generalized Linear Models

Generalized Linear Models

  • E(y) = \mu = g^{-1}(\beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_px_p)
  • y \sim distribution(\mu)

And in vector notation:

  • E(y) = \mu = g^{-1}(\bm{X}\bm{\beta})
  • y \sim distribution(\mu)