티스토리 뷰

반응형
Expectation and Variance of the Hypergeometric Distribution

🔷 Expectation and Variance of the Hypergeometric Distribution

✅ Random Variable X and Its Range

The hypergeometric distribution is a discrete probability distribution that describes the probability of obtaining a certain number of elements with a specific characteristic (e.g., category or label) when a fixed number of samples are drawn without replacement from a finite population.


The random variable \( X \) denotes the number of elements in the sample that fall under the target category. Its range is given by:

\[ \max(0, n - (N - K)) \leq X \leq \min(n, K) \]

This reflects the feasible range of values based on the size of the population and sample.

For example, if \( N = 20 \), \( K = 5 \), and \( n = 4 \), then:

  • \( \min(4, 5) = 4 \)
  • \( \max(0, 4 - (20 - 5)) = \max(0, -11) = 0 \)

Thus, \( X \in \{ 0, 1, 2, 3, 4 \} \).




✅ X as a Sum of Indicator Variables

In the hypergeometric distribution, the variable \( X \) can be expressed as a sum of indicator variables as follows:

\[ X = I_1 + I_2 + \cdots + I_n \]

Each \( I_j \) is an indicator variable that equals 1 if the \( j \)th selected item is in the target category, and 0 otherwise.




✅ \(I_j\) Does Not Indicate Time Order

The index \( j \) in \( I_j \) refers to the position in the sample, not the order in which the item was drawn.

For instance, suppose the population consists of 2 strawberry-flavored candies (denoted as S) and 3 lemon-flavored candies (L), i.e., \( (S, S, L, L, L) \), and we sample 3 items without replacement. If we obtain the sample \( (L, S, L) \), then:

  • \( I_1 = 0 \): The first item is lemon
  • \( I_2 = 1 \): The second item is strawberry
  • \( I_3 = 0 \): The third item is lemon

Note that \( I_1, I_2, I_3 \) are not independent, since sampling is done without replacement. The selection of one item affects the probability of selecting others.




✅ Why \(\mathbb{E}[I_j] = \frac{K}{N}\)

Each \( I_j \) represents the chance that a particular sample position contains an element from the target category. This chance equals the proportion of such elements in the population.

Thus, the expectation of each indicator variable is always:

\[ \mathbb{E}[I_j] = \frac{K}{N} \quad \text{for all } j \in \{1, \dots, n\} \]




✅ Verifying \(\mathbb{E}[I_j]\) Using Full Enumeration of (1,1,0,0,0)

※ Note: This example does not represent the distribution of \( X \), but demonstrates that the expected value of each position remains consistent across permutations.


Permutation # Permutation \(I_1\) \(I_2\) \(I_3\)
1(1, 1, 0, 0, 0)110
2(1, 0, 1, 0, 0)101
3(1, 0, 0, 1, 0)100
4(1, 0, 0, 0, 1)100
5(0, 1, 1, 0, 0)011
6(0, 1, 0, 1, 0)010
7(0, 1, 0, 0, 1)010
8(0, 0, 1, 1, 0)001
9(0, 0, 1, 0, 1)001
10(0, 0, 0, 1, 1)000
Expected value0.40.40.4

✅ Sample-Based Example: X = I₁ + I₂

Let’s verify the expectation of \( X \) using the values and probabilities of \( I_1 \), \( I_2 \) in real samples:


Sample \(I_1\) \(I_2\) \(X = I_1 + I_2\) \(P(\text{sample})\)
(1,1)1120.1
(1,0)1010.3
(0,1)0110.3
(0,0)0000.3
Expected Value0.40.40.8



✅ Generalized Expectation Formula

We have shown the structure \( X = I_1 + I_2 \) for small \( n \), but the same logic applies for any \( n \).

When \( X = I_1 + I_2 + \cdots + I_n \), we compute the expectation as follows:

\[ \mathbb{E}[X] = \mathbb{E}[I_1 + I_2 + \cdots + I_n] = \mathbb{E}[I_1] + \mathbb{E}[I_2] + \cdots + \mathbb{E}[I_n] \]

This result uses the linearity of expectation, which holds even if the variables are dependent.

Since each \( \mathbb{E}[I_j] = \frac{K}{N} \), we get:

\[ \mathbb{E}[X] = n \cdot \frac{K}{N} = np \quad \text{where } p = \frac{K}{N} \]



✅ Comparing Three Expectation Methods

We now summarize the three methods of computing the expectation:

  • Indicator Variables: \( 0.4 + 0.4 = 0.8 \)
  • Probability Distribution: \( 0 \cdot 0.3 + 1 \cdot 0.6 + 2 \cdot 0.1 = 0.8 \)
  • General Formula: \( 2 \cdot \frac{2}{5} = 0.8 \)

All three methods yield exactly the same result, confirming consistency between structure and numeric values.




반응형
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2025/04   »
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30
글 보관함
반응형