[MS2] ch.1-2. 결정이론

2 minute read

서울대학교 통계학과 대학원 2021년도 1학년 1학기 통계이론1 수업의 정리내용입니다.

reference: Lecture note (Prof.Jaeyong Lee),
Mathematical Statistics : Basic Ideas and Selected Topics, Volume I, Second Edition By Peter J. Bickel, Kjell A. Doksum

2. 결정이론(decision Theory)

2.1 결정이론의 요소들

모형

$ ( \mathscr{X},\mathcal{F},(P_\theta)_{\theta \in \Theta}) $
- 각각 표본공간, 표본공간위에 정의된 시그마필드, 시그마필드 상의 확률측도
$\mathscr{P} = \lbrace P_{\theta} : \theta \in \Theta \rbrace $
- 표본공간, 시그마 필드가 명확하면 줄여서 이렇게 쓰기도 함.

관측치

$X \sim P_\theta$

행동공간

$\mathscr{A}$ : 취할 수 있는 가능한 행동 혹은 결정의 집합

손실함수

$l: \mathscr{P} \times \mathscr{A} \rightarrow \mathbb{R}^+$
$l: \Theta \times \mathscr{A} \rightarrow \mathbb{R}^+$
- $l(\theta,a)$: $\theta$가 참일 떄, 행동 $a$를 취하면 발생하는 손실

결정규칙(decision rule)

$\delta: \mathscr{X} \rightarrow \mathscr{A}$ $ \ \ \ \ \ \ x \ \mapsto \ \delta(x)$
- (결정규칙은 표본공간에서 행동공간으로 가는 함수)

랜덤화된 결정규칙(randomized decision rule)

$\mathscr{A}^*$: 행동공간 상의 확률들의 집합
$\delta: \mathscr{X} \rightarrow \mathscr{A}^*$: 랜덤화된 결정규칙

$\delta : \mathscr{X} \times (\sigma-field ~~ of ~~\mathscr{A}) \rightarrow [0,1]$

$l(\theta, \delta(x, \cdot )) = \int_\mathscr{A}l(\theta, \delta(x,da))$

위험함수(risk function)

$R(\theta, \delta) = E_\theta l(\theta,\delta(x)) = \int l(\theta, \delta(x))P_\theta(dx)$

검정

1.검정함수(test, test function)
- $\delta: \mathscr{X} \rightarrow [0,1]$
- $\delta(x) : H_0$를 기학할 확률 혹은 $H_1$을 선택할 확률. 랜덤화된 결정규칙
2.$\delta(x) = I(x \in C)$일 때, $C$를 기각역(critical region)이라 한다.
3.오차

Action/ 참값	$H_0$	$H_1$
$H_0$	ok	type 2 error
$H_1$	type 1 error	ok

4.0-1(zero-one) 손실함수의 위험함수
- $\delta(x) = I(x \in C)$
- $\theta \in \Theta_0$일 때, $R(\theta, \delta) = E_\theta \delta(x) = P_\theta (X \in C)$
- $\theta \in \Theta_1$일 때, $R(\theta, \delta) = E_\theta [1-\delta(x)] = 1 - P_\theta (X \in C) = P_\theta (X \in C^c)$

신뢰구간

$\delta(x) : \Theta$의 부분집합
$l(\theta, \delta(x)) = 1 ,\theta \notin \delta(x)$ $ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =0, \theta \in \delta(x)$
$R(\theta, \delta(x)) = E_\theta l(\theta, \delta(x)) = E_\theta I(\theta \notin \delta(x)) = 1 - P_\theta (\theta \in \delta(x)) $ $ \ \ \ \ \ \ \ \ \ \ =1 ~- $ 포함확률(converage probability)

2.2 결정규칙의 비교

$\textbf{Definition}$

$\delta, \delta’$: 결정규칙, $\delta$는 $\delta’$보다 좋다(improves, are better) $\Leftrightarrow R(\theta, \delta) \leqslant R(\theta, \delta’), \forall \theta$, and for some $\theta \in H, R(\theta, \delta) < R(\theta, \delta’)$

$\textbf{Definition[admissible, inadmissible]}$

$\delta$: 허용불가능(inadmissible) $\Leftrightarrow$ $\delta$보다 좋은 $\delta’$가 존재
$\delta$: not 허용불가능(inadmissible) $\Leftrightarrow$ $\delta$: 허용가능(admissible)

* 결정규칙을 선택하는 두 가지 정보법

모든 결정 규칙을 다 고려하지 않고, 특정한 기준(불편성, 대칭성, 유의수준)을 만족하는 결정규칙 중에서 제일 좋은 것을 선택
전역 기준을 사용해서 모든 결정규칙을 고려 - 베이즈 방법, 최소최대기준

$\textbf{Definition[베이즈 규칙]}$

(베이즈 위험[Bayes risk])
$r(\pi, \delta) = \int_\Theta R(\theta,\delta)\pi(d\theta)$
$\pi$는 $\Theta$상에 정의된 확률분포, 사전분포라 부른다.
(사전분포 $\pi$에 대한 베이즈 규칙) $\delta^B = argmin_\delta r(\pi, \delta)$

$\textbf{Definition[최소최대규칙(minimax rule)]}$

$\delta^M = argmin ~sup_\theta R(\theta, \delta)$

Remark

$\Theta$는 거리공간. 통계모형 $ ( \mathscr{X},\mathcal{F},(P_\theta)_{\theta \in \Theta}) $가 다음을 만족한다고 하자.

(확률의 정규성) $P:\mathcal{F} \times \Theta \rightarrow [0,1]$이 확률적 커널(stochastic kernel)이 된다. 즉, 1) 모든 $A \in \mathcal{F}$에 대해, $\theta \rightarrow P_\theta(A) = P(A \vert \theta)$는 보렐측도가능 2) $\forall \theta, ~ P_\theta(\cdot)$는 $\mathcal{F}$상의 확률측도다.
(모양의 연속성) $\theta \mapsto P_\theta$은 $L_1-norm$에 대해 연속

$\mathscr{A}^*:$랜덤화된 행동공간

$\textbf{Theorem[완비모임정리(complete class theorem), Liese and Miescke, 2008, 정리3.82]}$

$\Theta$:분리가능한 거리공간(Separable metric space)
$\mathscr{A}:$긴밀한(compact) 거리공간 $l(\theta, a)$: 유계, $(\theta,a)$에 관해 연속
모든 $\delta \in \mathscr{A}^*$에 대해

$\delta_K \leadsto \delta_0$
$R(\theta, \delta_0) \leqslant R(\theta, \delta), \forall \theta$

를 만족하는 $\delta_0 \in \mathscr{A}^*$와 사전분포의 열 $\pi$ 가 존재. $\delta_K$는 사전분포 $\pi_K$ 에 대한 베이즈 규칙

<참고>

임의의 결정규칙 $\delta$에 대해, $\delta$보다 더 좋거나 같은 베이즈 규칙 혹은 베이즈 규칙의 극한을 구할 수 있다.

Taejun Park