14 Jan, 2016

Decision Theory


Decision Theory

Notation:

  1. $a \in A$: $a$ is an “action” and $A$ is the set of all possible actions
  2. $L(\theta,a)$: loss function for $\theta \in \Theta$ and $a\in A$, $L(\theta,a) \ge -K \gt -\infty$
    • useful loss functions:
    • Quadratic loss: $(\theta-a)^2$
    • Absolute loss: $\abs{\theta-a}$
    • 0-1 loss: $I(\theta \ne a)$
  3. $\delta(x)$: decision rule, function from $X$ to $A$, $X$ is the sample space

Example:

  • Drug company needs to market a new pain killer:
    • $\theta$: proportion of the market that the drug will capture
    • $\theta \in \Theta = [0,1]$, at $A =[0,1]$
    • $L(\theta,a) = \theta-a \text{ if } \theta\ge a$
    • $L(\theta,a) = 2(a-\theta) \text{ if } \theta\lt a$
  • Survey of $x$ peolpe out of $n$ responded “yes”
    • possible model $x|n \sim Bin(n,\theta)$
    • possible decision rule: $\delta(x) = x/n$
  • Classical Decision Theory
    • Def: The risk function of a decision rule $\delta(x)$
    • $R(\theta,\delta) = \int_x L(\theta,\delta(x))p(x|\theta)dx$
    • we say that $\delta_1(x)$ is “better” than $\delta_2(x)$ if $R(\theta,\delta_1) \le R(\theta,\delta_2) \forall \theta$.
    • How do we pick an estimator $\hat\theta$ for $\theta$?
    • Choose $\hat\theta(\delta(x))$ that minimizes $R$.
      • Typically, we need to constrain $\theta$ space to get an optimum. For example, unbiased, linear, etc.
      • Otherwise, you have many local minimums.
  • Bayesian Decision Theory
    • $\theta$ unknown random variable
    • $x$ are observed data
    • Def: Let $\pi^*(\theta)$ a pdf at the time of decision making. The Bayesian expected loss for an action $a$ is \(\rho(\pi^\*,a) = \int_\Theta L(\theta,a) \pi^*(\theta) d\theta = E_{\pi^*(\theta)}[L(\theta,a)]\)
    • $E_{x|\theta}[L(\theta,a)] = R(\theta,a)$
    • $E_{\theta}[L(\theta,a)] = \rho(\pi,a)$, $\pi$ is prior
    • $E_{\theta|x}[L(\theta,a)] = \rho(p,a)$, $\rho$ is posterior
    • Bayesian decision principle: choose $a \in A$ that minimizes $\rho(\pi^*,a)$. This action is called a Bayes action.
    • Example (Drug Company):
    • Assume:
      • $\pi(\theta) = 1/10$ if $.1 \lt \theta \lt .2$, 0 o.w.
      • if no data: $\rho(\pi,a) = \int_0^a L(\theta,a) \pi(\theta) d\theta$
        • $ = \int_0^a~2(a-\theta)\pi(\theta)~ d\theta + \int_a^1~(\theta-a)\pi(\theta)~d\theta$
      • case 1: $a \le .1 \Rightarrow \rho(\pi,a) = .15 - a$, minimum at $a = .1$
        • $\rho\pi,.1) = .05$
      • case 2: $.1 \lt a \lt .2 \Rightarrow a=2/15$ is optimal and $\rho(\pi,2/15) = .03$
      • case 3: $a \ge .2 \Rightarrow \rho(\pi,a) = 2a-.3$, optimal at $a=.2$. So $\rho(\pi,.2)=.1$
  • The posterior expected loss of an action $a\in A$ is \(\rho(p(x|\theta),a) = \int_\Theta L(\theta,a) p(\theta|x) d\theta\) we could have a situation like this: picture1

Minimizing \(\rho(p(x|\theta),a) = \int_\Theta L(\theta,a) p(\theta|x) d\theta\)

  • Say you obtained a Bayes action by minimizing $\rho(p(\theta|x),a)$, say $\delta^{p(\theta|x)}$
  • We can compute the “Bayes Risk” = $E_x[\rho( p(\theta|x),\hat\delta )] = \int_X p(x) \int_\Theta L(\theta,\hat\delta)p(x|\theta)\pi(\theta)d\theta dx$
  • $= \int_\Theta R(\theta,\hat\delta)\pi(\theta) d\theta = E_\theta[R(\theta,\delta)]$

Recap:

\(\begin{array}{lll} & \text{Frequentist} & \text{Bayesian} \\\\ \text{Estimator} & \delta(x) & \delta^\pi(x) \\\\ \text{likelihood} & p(x|\theta)& p(x|\theta) \\\\ \text{Prior} & \text{NA} & \pi(\theta) \\\\ \text{Risk} & R(\theta,\delta) = E_{x|\theta}[L(\theta,\delta)] & \rho(p(\theta|x),\delta^\pi(x)) = E_{\theta|x}[L(\theta,\delta^\pi(x)] \\\\ \text{Bayes Risk} & E_\theta[R(\theta,\delta)] & E_x[\rho(\pi,\delta)]\\\\ \end{array}\) The Bayes Risk are equivalent and are equal to $r(\pi,\delta) = E_{\theta,x}[L(\theta,\delta)]$ = $\int_\mathcal{X}\int_\Theta~L(\theta,\delta(x))~f(x|\theta)~\pi(\theta)~d\theta~d\mathcal{X} $.