14 Jan, 2016
Decision Theory
Decision Theory
Notation:
- $a \in A$: $a$ is an “action” and $A$ is the set of all possible actions
- $L(\theta,a)$: loss function for $\theta \in \Theta$ and $a\in A$, $L(\theta,a) \ge -K \gt -\infty$
- Quadratic loss: $(\theta-a)^2$
- Absolute loss: $\abs{\theta-a}$
- 0-1 loss: $I(\theta \ne a)$
- $\delta(x)$: decision rule, function from $X$ to $A$, $X$ is the sample space
Example:
- Drug company needs to market a new pain killer:
- $\theta$: proportion of the market that the drug will capture
- $\theta \in \Theta = [0,1]$, at $A =[0,1]$
- $L(\theta,a) = \theta-a \text{ if } \theta\ge a$
- $L(\theta,a) = 2(a-\theta) \text{ if } \theta\lt a$
- Survey of $x$ peolpe out of $n$ responded “yes”
- possible model $x|n \sim Bin(n,\theta)$
- possible decision rule: $\delta(x) = x/n$
- Classical Decision Theory
- Def: The risk function of a decision rule $\delta(x)$
- $R(\theta,\delta) = \int_x L(\theta,\delta(x))p(x|\theta)dx$
- we say that $\delta_1(x)$ is “better” than $\delta_2(x)$ if $R(\theta,\delta_1) \le R(\theta,\delta_2) \forall \theta$.
- How do we pick an estimator $\hat\theta$ for $\theta$?
- Choose $\hat\theta(\delta(x))$ that minimizes $R$.
- Typically, we need to constrain $\theta$ space to get an optimum. For example, unbiased, linear, etc.
- Otherwise, you have many local minimums.
- Bayesian Decision Theory
- $\theta$ unknown random variable
- $x$ are observed data
- Def: Let $\pi^*(\theta)$ a pdf at the time of decision making. The Bayesian expected loss for an action $a$ is \(\rho(\pi^\*,a) = \int_\Theta L(\theta,a) \pi^*(\theta) d\theta = E_{\pi^*(\theta)}[L(\theta,a)]\)
- $E_{x|\theta}[L(\theta,a)] = R(\theta,a)$
- $E_{\theta}[L(\theta,a)] = \rho(\pi,a)$, $\pi$ is prior
- $E_{\theta|x}[L(\theta,a)] = \rho(p,a)$, $\rho$ is posterior
- Bayesian decision principle: choose $a \in A$ that minimizes $\rho(\pi^*,a)$. This action is called a Bayes action.
- Example (Drug Company):
- Assume:
- $\pi(\theta) = 1/10$ if $.1 \lt \theta \lt .2$, 0 o.w.
- if no data: $\rho(\pi,a) = \int_0^a L(\theta,a) \pi(\theta) d\theta$
- $ = \int_0^a~2(a-\theta)\pi(\theta)~ d\theta + \int_a^1~(\theta-a)\pi(\theta)~d\theta$
- case 1: $a \le .1 \Rightarrow \rho(\pi,a) = .15 - a$, minimum at $a = .1$
- case 2: $.1 \lt a \lt .2 \Rightarrow a=2/15$ is optimal and $\rho(\pi,2/15) = .03$
- case 3: $a \ge .2 \Rightarrow \rho(\pi,a) = 2a-.3$, optimal at $a=.2$. So $\rho(\pi,.2)=.1$
- The posterior expected loss of an action $a\in A$ is \(\rho(p(x|\theta),a) = \int_\Theta L(\theta,a) p(\theta|x) d\theta\)
we could have a situation like this: picture1
Minimizing \(\rho(p(x|\theta),a) = \int_\Theta L(\theta,a) p(\theta|x) d\theta\)
- Say you obtained a Bayes action by minimizing $\rho(p(\theta|x),a)$, say $\delta^{p(\theta|x)}$
- We can compute the “Bayes Risk” = $E_x[\rho( p(\theta|x),\hat\delta )] = \int_X p(x) \int_\Theta L(\theta,\hat\delta)p(x|\theta)\pi(\theta)d\theta dx$
- $= \int_\Theta R(\theta,\hat\delta)\pi(\theta) d\theta = E_\theta[R(\theta,\delta)]$
Recap:
\(\begin{array}{lll}
& \text{Frequentist} & \text{Bayesian} \\\\
\text{Estimator} & \delta(x) & \delta^\pi(x) \\\\
\text{likelihood} & p(x|\theta)& p(x|\theta) \\\\
\text{Prior} & \text{NA} & \pi(\theta) \\\\
\text{Risk} & R(\theta,\delta) = E_{x|\theta}[L(\theta,\delta)] & \rho(p(\theta|x),\delta^\pi(x)) = E_{\theta|x}[L(\theta,\delta^\pi(x)] \\\\
\text{Bayes Risk} & E_\theta[R(\theta,\delta)] & E_x[\rho(\pi,\delta)]\\\\
\end{array}\)
The Bayes Risk are equivalent and are equal to $r(\pi,\delta) = E_{\theta,x}[L(\theta,\delta)]$ =
$\int_\mathcal{X}\int_\Theta~L(\theta,\delta(x))~f(x|\theta)~\pi(\theta)~d\theta~d\mathcal{X} $.