14 Jan, 2016

Decision Theory

Notation:

$a \in A$: $a$ is an “action” and $A$ is the set of all possible actions
$L(\theta,a)$: loss function for $\theta \in \Theta$ and $a\in A$, $L(\theta,a) \ge -K \gt -\infty$
- useful loss functions:
- Quadratic loss: $(\theta-a)^2$
- Absolute loss: $\abs{\theta-a}$
- 0-1 loss: $I(\theta \ne a)$
$\delta(x)$: decision rule, function from $X$ to $A$, $X$ is the sample space

Example:

Drug company needs to market a new pain killer:
- $\theta$: proportion of the market that the drug will capture
- $\theta \in \Theta = [0,1]$, at $A =[0,1]$
- $L(\theta,a) = \theta-a \text{ if } \theta\ge a$
- $L(\theta,a) = 2(a-\theta) \text{ if } \theta\lt a$
Survey of $x$ peolpe out of $n$ responded “yes”
- possible model $x|n \sim Bin(n,\theta)$
- possible decision rule: $\delta(x) = x/n$
Classical Decision Theory
- Def: The risk function of a decision rule $\delta(x)$
- $R(\theta,\delta) = \int_x L(\theta,\delta(x))p(x|\theta)dx$
- we say that $\delta_1(x)$ is “better” than $\delta_2(x)$ if $R(\theta,\delta_1) \le R(\theta,\delta_2) \forall \theta$.
- How do we pick an estimator $\hat\theta$ for $\theta$?
- Choose $\hat\theta(\delta(x))$ that minimizes $R$.
  - Typically, we need to constrain $\theta$ space to get an optimum. For example, unbiased, linear, etc.
  - Otherwise, you have many local minimums.
Bayesian Decision Theory
- $\theta$ unknown random variable
- $x$ are observed data
- Def: Let $\pi^*(\theta)$ a pdf at the time of decision making. The Bayesian expected loss for an action $a$ is $\rho(\pi^\*,a) = \int_\Theta L(\theta,a) \pi^*(\theta) d\theta = E_{\pi^*(\theta)}[L(\theta,a)]$
- $E_{x|\theta}[L(\theta,a)] = R(\theta,a)$
- $E_{\theta}[L(\theta,a)] = \rho(\pi,a)$, $\pi$ is prior
- $E_{\theta|x}[L(\theta,a)] = \rho(p,a)$, $\rho$ is posterior
- Bayesian decision principle: choose $a \in A$ that minimizes $\rho(\pi^*,a)$. This action is called a Bayes action.
- Example (Drug Company):
- Assume:
  - $\pi(\theta) = 1/10$ if $.1 \lt \theta \lt .2$, 0 o.w.
  - if no data: $\rho(\pi,a) = \int_0^a L(\theta,a) \pi(\theta) d\theta$
    - $ = \int_0^a~2(a-\theta)\pi(\theta)~ d\theta + \int_a^1~(\theta-a)\pi(\theta)~d\theta$
  - case 1: $a \le .1 \Rightarrow \rho(\pi,a) = .15 - a$, minimum at $a = .1$
    - $\rho\pi,.1) = .05$
  - case 2: $.1 \lt a \lt .2 \Rightarrow a=2/15$ is optimal and $\rho(\pi,2/15) = .03$
  - case 3: $a \ge .2 \Rightarrow \rho(\pi,a) = 2a-.3$, optimal at $a=.2$. So $\rho(\pi,.2)=.1$
The posterior expected loss of an action $a\in A$ is $\rho(p(x|\theta),a) = \int_\Theta L(\theta,a) p(\theta|x) d\theta$ we could have a situation like this: picture1

Minimizing $\rho(p(x|\theta),a) = \int_\Theta L(\theta,a) p(\theta|x) d\theta$

Say you obtained a Bayes action by minimizing $\rho(p(\theta|x),a)$, say $\delta^{p(\theta|x)}$
We can compute the “Bayes Risk” = $E_x[\rho( p(\theta|x),\hat\delta )] = \int_X p(x) \int_\Theta L(\theta,\hat\delta)p(x|\theta)\pi(\theta)d\theta dx$
$= \int_\Theta R(\theta,\hat\delta)\pi(\theta) d\theta = E_\theta[R(\theta,\delta)]$

Recap:

$\begin{array}{lll} & \text{Frequentist} & \text{Bayesian} \\\\ \text{Estimator} & \delta(x) & \delta^\pi(x) \\\\ \text{likelihood} & p(x|\theta)& p(x|\theta) \\\\ \text{Prior} & \text{NA} & \pi(\theta) \\\\ \text{Risk} & R(\theta,\delta) = E_{x|\theta}[L(\theta,\delta)] & \rho(p(\theta|x),\delta^\pi(x)) = E_{\theta|x}[L(\theta,\delta^\pi(x)] \\\\ \text{Bayes Risk} & E_\theta[R(\theta,\delta)] & E_x[\rho(\pi,\delta)]\\\\ \end{array}$ The Bayes Risk are equivalent and are equal to $r(\pi,\delta) = E_{\theta,x}[L(\theta,\delta)]$ = $\int_\mathcal{X}\int_\Theta~L(\theta,\delta(x))~f(x|\theta)~\pi(\theta)~d\theta~d\mathcal{X} $.