티스토리 뷰

통계학/여인권-통계학의 이해

Expectation and Variance of the Hypergeometric Distribution

키성열 2025. 4. 20. 15:33

Expectation and Variance of the Hypergeometric Distribution

🔷 Expectation and Variance of the Hypergeometric Distribution

✅ Random Variable X and Its Range

The hypergeometric distribution is a discrete probability distribution that describes the probability of obtaining a certain number of elements with a specific characteristic (e.g., category or label) when a fixed number of samples are drawn without replacement from a finite population.

The random variable \( X \) denotes the number of elements in the sample that fall under the target category. Its range is given by:

\[ \max(0, n - (N - K)) \leq X \leq \min(n, K) \]

This reflects the feasible range of values based on the size of the population and sample.

For example, if \( N = 20 \), \( K = 5 \), and \( n = 4 \), then:

\( \min(4, 5) = 4 \)
\( \max(0, 4 - (20 - 5)) = \max(0, -11) = 0 \)

Thus, \( X \in \{ 0, 1, 2, 3, 4 \} \).

✅ X as a Sum of Indicator Variables

In the hypergeometric distribution, the variable \( X \) can be expressed as a sum of indicator variables as follows:

\[ X = I_1 + I_2 + \cdots + I_n \]

Each \( I_j \) is an indicator variable that equals 1 if the \( j \)th selected item is in the target category, and 0 otherwise.

✅ \(I_j\) Does Not Indicate Time Order

The index \( j \) in \( I_j \) refers to the position in the sample, not the order in which the item was drawn.

For instance, suppose the population consists of 2 strawberry-flavored candies (denoted as S) and 3 lemon-flavored candies (L), i.e., \( (S, S, L, L, L) \), and we sample 3 items without replacement. If we obtain the sample \( (L, S, L) \), then:

\( I_1 = 0 \): The first item is lemon
\( I_2 = 1 \): The second item is strawberry
\( I_3 = 0 \): The third item is lemon

Note that \( I_1, I_2, I_3 \) are not independent, since sampling is done without replacement. The selection of one item affects the probability of selecting others.

✅ Why \(\mathbb{E}[I_j] = \frac{K}{N}\)

Each \( I_j \) represents the chance that a particular sample position contains an element from the target category. This chance equals the proportion of such elements in the population.

Thus, the expectation of each indicator variable is always:

\[ \mathbb{E}[I_j] = \frac{K}{N} \quad \text{for all } j \in \{1, \dots, n\} \]

✅ Verifying \(\mathbb{E}[I_j]\) Using Full Enumeration of (1,1,0,0,0)

※ Note: This example does not represent the distribution of \( X \), but demonstrates that the expected value of each position remains consistent across permutations.

Permutation #	Permutation	\(I_1\)	\(I_2\)	\(I_3\)
1	(1, 1, 0, 0, 0)	1	1	0
2	(1, 0, 1, 0, 0)	1	0	1
3	(1, 0, 0, 1, 0)	1	0	0
4	(1, 0, 0, 0, 1)	1	0	0
5	(0, 1, 1, 0, 0)	0	1	1
6	(0, 1, 0, 1, 0)	0	1	0
7	(0, 1, 0, 0, 1)	0	1	0
8	(0, 0, 1, 1, 0)	0	0	1
9	(0, 0, 1, 0, 1)	0	0	1
10	(0, 0, 0, 1, 1)	0	0	0
Expected value		0.4	0.4	0.4

✅ Sample-Based Example: X = I₁ + I₂

Let’s verify the expectation of \( X \) using the values and probabilities of \( I_1 \), \( I_2 \) in real samples:

Sample	\(I_1\)	\(I_2\)	\(X = I_1 + I_2\)	\(P(\text{sample})\)
(1,1)	1	1	2	0.1
(1,0)	1	0	1	0.3
(0,1)	0	1	1	0.3
(0,0)	0	0	0	0.3
Expected Value	0.4	0.4	0.8

✅ Generalized Expectation Formula

We have shown the structure \( X = I_1 + I_2 \) for small \( n \), but the same logic applies for any \( n \).

When \( X = I_1 + I_2 + \cdots + I_n \), we compute the expectation as follows:

\[
\mathbb{E}[X] = \mathbb{E}[I_1 + I_2 + \cdots + I_n] = \mathbb{E}[I_1] + \mathbb{E}[I_2] + \cdots + \mathbb{E}[I_n]
\]

This result uses the linearity of expectation, which holds even if the variables are dependent.

Since each \( \mathbb{E}[I_j] = \frac{K}{N} \), we get:

\[
\mathbb{E}[X] = n \cdot \frac{K}{N} = np \quad \text{where } p = \frac{K}{N}
\]

✅ Comparing Three Expectation Methods

We now summarize the three methods of computing the expectation:

Indicator Variables: \( 0.4 + 0.4 = 0.8 \)
Probability Distribution: \( 0 \cdot 0.3 + 1 \cdot 0.6 + 2 \cdot 0.1 = 0.8 \)
General Formula: \( 2 \cdot \frac{2}{5} = 0.8 \)

All three methods yield exactly the same result, confirming consistency between structure and numeric values.

'통계학 > 여인권-통계학의 이해' 카테고리의 다른 글

초기하분포의 기대값과 분산 (0)	2025.04.20
Hypergeometric Distribution (0)	2025.04.20
초기하분포(Hypergeometric Distribution)란? (0)	2025.04.20
🔷 이항분포: 모수 (0)	2025.04.19
이항분포의 평균과 분산 (0)	2025.04.19

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

글 보관함

일상 일기 블로그

티스토리 뷰