← Math|35 of 100
Probability

Conditional Probability

Master conditional probability, independence, Bayes' theorem, and their applications in AI/ML and everyday reasoning.

📂 Conditional📖 Lesson 35 of 100🎓 Free Course

Advertisement

Why It Matters

ℹ️ The Foundation of Modern AI

Conditional probability is arguably the most important concept in applied probability. Every time a spam filter decides whether an email is junk, a doctor interprets a test result, or a self-driving car predicts a pedestrian's next move, it uses conditional probability. Bayes' theorem — built entirely on conditional probability — is the backbone of Bayesian AI, medical diagnostics, and scientific reasoning. Understanding this concept is essential for any engineer, data scientist, or researcher.

In real life, we rarely reason about events in isolation. We constantly update our beliefs based on new information. What is the probability that it will rain, given that the sky is cloudy? Conditional probability formalizes this intuition. It transforms vague guesses into rigorous, calculable statements.


Conditional Probability

DfConditional Probability

Given two events A and B in a probability space, with P(B) > 0, the conditional probability of A given B is:

P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}

Intuitively, conditioning on B means we restrict our sample space to outcomes where B occurred, then measure what fraction of those also satisfy A.

Conditional Probability

P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}

Here,

  • P(AB)P(A|B)=Probability of A given B has occurred
  • P(AcapB)P(A cap B)=Joint probability that both A and B occur
  • P(B)P(B)=Marginal probability of B (must be > 0)

Properties of Conditional Probability

Conditional probability is itself a probability measure — it satisfies all axioms of probability:

  1. Non-negativity: P(A|B) ≥ 0 for all events A
  2. Normalization: P(Ω|B) = 1 (the entire sample space given B has probability 1)
  3. Additivity: For disjoint events A₁, A₂: P(A₁ ∪ A₂ | B) = P(A₁|B) + P(A₂|B)

This means every theorem and rule of probability applies within the conditioned space.

Intuitive Interpretation

💡 Visualizing Conditional Probability

Imagine a Venn diagram where the square represents the sample space Ω. Event B carves out a region. Within that region, A ∩ B is a sub-region. P(A|B) is the ratio of the sub-region to the entire B region. Conditioning "shrinks" the world to B, then asks what fraction of that world satisfies A.


Multiplication Rule

Rearranging the definition of conditional probability yields the multiplication rule — a powerful tool for computing joint probabilities.

Multiplication Rule

P(AB)=P(AB)P(B)=P(BA)P(A)P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)

Here,

  • P(AcapB)P(A cap B)=Joint probability of A and B
  • P(AB)P(A|B)=Conditional probability of A given B
  • P(BA)P(B|A)=Conditional probability of B given A
  • P(A),P(B)P(A), P(B)=Marginal probabilities

Extended Multiplication Rule (Chain Rule)

For multiple events A₁, A₂, ..., Aₙ:

Chain Rule of Probability

P(A1A2An)=P(A1)P(A2A1)P(A3A1A2)P(AnA1An1)P(A_1 \cap A_2 \cap \cdots \cap A_n) = P(A_1) \cdot P(A_2|A_1) \cdot P(A_3|A_1 \cap A_2) \cdots P(A_n|A_1 \cap \cdots \cap A_{n-1})

Here,

  • P(A1)P(A_1)=Marginal probability of the first event
  • P(AiA1capcdotscapAi1)P(A_i | A_1 cap cdots cap A_{i-1})=Conditional probability of the i-th event given all previous events

📝Applying the Chain Rule

Three cards are drawn without replacement from a deck of 52. What is the probability that all three are aces?

P(A₁ ∩ A₂ ∩ A₃) = P(A₁) · P(A₂|A₁) · P(A₃|A₁ ∩ A₂) = (4/52) · (3/51) · (2/50) = 24/132600 ≈ 0.000181


Independence

DfStatistical Independence

Two events A and B are independent if and only if:

P(AB)=P(A)P(B)P(A \cap B) = P(A) \cdot P(B)

Equivalently, A and B are independent if and only if any of the following holds:

  • P(A|B) = P(A)
  • P(B|A) = P(B)
  • P(A ∩ B) = P(A) · P(B)

Knowing that one event occurred does not change the probability of the other.

DfMutually Exclusive Events

Two events A and B are mutually exclusive (or disjoint) if they cannot occur simultaneously:

P(AB)=0P(A \cap B) = 0

If A occurs, then B cannot, and vice versa.

⚠️ Independence ≠ Mutually Exclusive

This is one of the most common and dangerous misconceptions in probability.

PropertyIndependent EventsMutually Exclusive Events
Joint probabilityP(A ∩ B) = P(A)P(B)P(A ∩ B) = 0
Overlap in Venn diagramYes (usually)None
RelationshipKnowing one tells nothing about the otherKnowing one tells you the other did NOT occur
ExampleTwo coin flipsTwo sides of the same coin flip

If P(A) > 0 and P(B) > 0, independent events must overlap (P(A∩B) > 0), while mutually exclusive events cannot overlap. They are almost opposite concepts.

Pairwise vs. Mutual Independence

ℹ️ A Subtle Distinction

Events A, B, C can be pairwise independent (every pair is independent) without being mutually independent (every subset is independent).

Example: Toss a fair coin twice. Let A = {HH, HT}, B = {HH, TH}, C = {HH, TT}. Then P(A) = P(B) = P(C) = 1/2, and P(A∩B) = P(A∩C) = P(B∩C) = 1/4 = P(A)P(B), so they are pairwise independent. But P(A∩B∩C) = 1/4 ≠ P(A)P(B)P(C) = 1/8, so they are NOT mutually independent.


Bayes' Theorem

ThBayes' Theorem

For events A and B with P(B) > 0:

P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

More generally, if B₁, B₂, ..., Bₙ form a partition of the sample space:

P(BkA)=P(ABk)P(Bk)i=1nP(ABi)P(Bi)P(B_k|A) = \frac{P(A|B_k) \cdot P(B_k)}{\sum_{i=1}^{n} P(A|B_i) \cdot P(B_i)}

Bayes' Theorem

P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

Here,

  • P(AB)P(A|B)=Posterior probability — what we want to find
  • P(BA)P(B|A)=Likelihood — probability of evidence given hypothesis
  • P(A)P(A)=Prior probability — initial belief before seeing evidence
  • P(B)P(B)=Marginal likelihood — total probability of evidence

Bayesian Reasoning Framework

💡 The Language of Bayes

Bayes' theorem provides a formal framework for updating beliefs with evidence:

  • Prior P(A): Your belief about A before seeing any data
  • Likelihood P(B|A): How likely is the evidence if A is true?
  • Posterior P(A|B): Your updated belief about A after seeing evidence B
  • Marginal Likelihood P(B): How likely is the evidence overall?

The posterior is always a "compromise" between the prior and the likelihood, weighted by the prior.


Law of Total Probability

ThLaw of Total Probability

If B₁, B₂, ..., Bₙ form a partition of the sample space (mutually exclusive and collectively exhaustive), then for any event A:

P(A)=i=1nP(ABi)P(Bi)P(A) = \sum_{i=1}^{n} P(A|B_i) \cdot P(B_i)

This law allows us to decompose complex probability calculations into manageable pieces by considering all possible scenarios that could lead to event A.

Continuous Version

For continuous random variables with probability density f_X(x) and conditional density f_{Y|X}(y|x):

Continuous Law of Total Probability

fY(y)=fYX(yx)fX(x)dxf_Y(y) = \int_{-\infty}^{\infty} f_{Y|X}(y|x) \cdot f_X(x) \, dx

Here,

  • fY(y)f_Y(y)=Marginal density of Y
  • fYX(yx)f_{Y|X}(y|x)=Conditional density of Y given X=x
  • fX(x)f_X(x)=Marginal density of X

Medical Testing Example

This is the classic "base rate fallacy" example that demonstrates why conditional probability matters.

Setup

  • Disease prevalence: P(Disease) = 0.01 (1% of population)
  • Test sensitivity: P(Positive|Disease) = 0.95 (95% true positive rate)
  • Test specificity: P(Negative|Healthy) = 0.95 (95% true negative rate)
  • False positive rate: P(Positive|Healthy) = 0.05

What is P(Disease | Positive)?

📝Medical Test Calculation

Given:

  • P(Disease) = 0.01
  • P(Positive | Disease) = 0.95
  • P(Positive | Healthy) = 0.05

Step 1: Find P(Positive) using the Law of Total Probability: P(Positive) = P(Positive|Disease)·P(Disease) + P(Positive|Healthy)·P(Healthy) P(Positive) = 0.95 × 0.01 + 0.05 × 0.99 = 0.0095 + 0.0495 = 0.059

Step 2: Apply Bayes' Theorem: P(Disease | Positive) = P(Positive|Disease) · P(Disease) / P(Positive) P(Disease | Positive) = (0.95 × 0.01) / 0.059 ≈ 0.161

Result: Only ~16.1% of people who test positive actually have the disease!

⚠️ The Base Rate Fallacy

This result shocks most people. A test that is 95% accurate yields only a 16.1% true-positive rate when the disease prevalence is 1%. The key insight: the 5% false-positive rate, applied to the large healthy population (99%), generates far more false positives than the 95% true-positive rate generates from the small diseased population (1%). Base rates matter enormously.

What Happens with Higher Prevalence?

PrevalenceP(Disease|Positive)Interpretation
0.001 (0.1%)1.87%Very few positives are true
0.01 (1%)16.1%Surprisingly low
0.10 (10%)67.9%More useful
0.50 (50%)95.0%Almost as good as the test accuracy

Repeated Testing

If someone tests positive twice independently, the posterior updates:

P(Disease | 2 positives) = P(2 pos|Disease)·P(Disease) / P(2 pos) = (0.95² × 0.01) / (0.95² × 0.01 + 0.05² × 0.99) ≈ 0.789 (78.9%)

ℹ️ Why Repeated Testing Helps

Each independent test provides additional evidence. After two positive tests, the probability jumps from 16.1% to 78.9%. After three: approximately 96.3%. This is why medical protocols often require confirmatory testing.


Python Implementation

import numpy as np
from collections import defaultdict

def conditional_probability(joint_count, condition_count):
    """Calculate P(A|B) from counts."""
    if condition_count == 0:
        return 0.0
    return joint_count / condition_count

# --- Medical Testing Example ---
def medical_test():
    p_disease = 0.01
    p_pos_given_disease = 0.95
    p_pos_given_healthy = 0.05

    # Law of Total Probability
    p_positive = (p_pos_given_disease * p_disease +
                  p_pos_given_healthy * (1 - p_disease))

    # Bayes' Theorem
    p_disease_given_pos = (p_pos_given_disease * p_disease) / p_positive

    print(f"P(Positive) = {p_positive:.4f}")
    print(f"P(Disease | Positive) = {p_disease_given_pos:.4f}")
    return p_disease_given_pos

# --- Chain Rule: Drawing Without Replacement ---
def chain_rule_example():
    """Probability of drawing 3 aces in a row."""
    p = (4/52) * (3/51) * (2/50)
    print(f"P(3 aces) = {p:.6f}")
    return p

# --- Bayesian Spam Filter ---
def spam_filter():
    """Simple Bayesian spam classifier."""
    # Priors
    p_spam = 0.3
    p_ham = 0.7

    # Word likelihoods: P(word | spam), P(word | ham)
    word_likelihoods = {
        'free':  {'spam': 0.8, 'ham': 0.1},
        'money': {'spam': 0.6, 'ham': 0.05},
        'hello': {'spam': 0.1, 'ham': 0.5},
    }

    # Classify message with words ['free', 'money']
    words = ['free', 'money']
    p_email_spam = p_spam
    p_email_ham = p_ham

    for word in words:
        p_email_spam *= word_likelihoods[word]['spam']
        p_email_ham *= word_likelihoods[word]['ham']

    # Normalize
    total = p_email_spam + p_email_ham
    p_spam_given_words = p_email_spam / total

    print(f"P(Spam | 'free', 'money') = {p_spam_given_words:.4f}")
    return p_spam_given_words

# --- Independence Check ---
def check_independence(p_a, p_b, p_a_and_b, tolerance=1e-9):
    """Check if two events are independent."""
    independent = abs(p_a_and_b - p_a * p_b) < tolerance
    print(f"P(A)={p_a}, P(B)={p_b}, P(A∩B)={p_a_and_b}")
    print(f"Independent: {independent}")
    return independent

# --- Monte Carlo Simulation ---
def simulate_conditional(n_trials=1_000_000):
    """Monte Carlo estimation of P(A|B)."""
    # Roll two dice. A = sum > 7, B = first die > 3
    first = np.random.randint(1, 7, n_trials)
    second = np.random.randint(1, 7, n_trials)
    total = first + second

    b_mask = first > 3
    a_and_b = (total > 7) & b_mask

    p_b = b_mask.mean()
    p_a_given_b = a_and_b[b_mask].mean()

    print(f"Simulated P(A|B) = {p_a_given_b:.4f}")
    print(f"Analytical P(A|B) = {15/36 / (2/3):.4f}")
    return p_a_given_b

if __name__ == "__main__":
    print("=== Medical Test ===")
    medical_test()
    print("\n=== Chain Rule ===")
    chain_rule_example()
    print("\n=== Spam Filter ===")
    spam_filter()
    print("\n=== Independence Check ===")
    check_independence(0.3, 0.4, 0.12)   # Independent
    check_independence(0.3, 0.4, 0.10)   # Not independent

Applications in AI/ML

1. Naive Bayes Classifier

ℹ️ The Most Important ML Algorithm for Text

The Naive Bayes classifier applies Bayes' theorem with a "naive" independence assumption: all features are conditionally independent given the class label.

P(classfeatures)P(class)iP(featureiclass)P(\text{class} | \text{features}) \propto P(\text{class}) \prod_{i} P(\text{feature}_i | \text{class})

Despite this unrealistic assumption, it works remarkably well for text classification, spam filtering, and sentiment analysis.

Spam Filtering Example:

For an email with words w₁, w₂, ..., wₙ:

  • P(Spam | w₁, w₂, ..., wₙ) ∝ P(Spam) · P(w₁|Spam) · P(w₂|Spam) · ... · P(wₙ|Spam)
  • P(Ham | w₁, w₂, ..., wₙ) ∝ P(Ham) · P(w₁|Ham) · P(w₂|Ham) · ... · P(wₙ|Ham)

Classify as whichever class has the higher posterior.

2. Bayesian Inference in Machine Learning

Bayesian methods treat model parameters as random variables with distributions:

  • Prior: P(θ) — beliefs about parameters before seeing data
  • Likelihood: P(D|θ) — how well parameters explain the data
  • Posterior: P(θ|D) ∝ P(D|θ) · P(θ) — updated beliefs after seeing data

Applications include Bayesian optimization for hyperparameter tuning, Gaussian processes, and Bayesian neural networks.

3. Hidden Markov Models

HMMs use conditional probability chains for sequence modeling:

P(x₁, x₂, ..., xₙ, z₁, z₂, ..., zₙ) = P(z₁) ∏ P(xᵢ|zᵢ) · P(zᵢ|zᵢ₋₁)

Used in speech recognition, bioinformatics, and natural language processing.

4. Conditional Random Fields

CRFs model P(labels | observations) directly, avoiding the independence assumption of HMMs while still using the conditional probability framework.

5. Medical AI and Diagnosis

Bayesian networks represent diseases, symptoms, and test results as nodes in a directed graph. Conditional probability tables at each node enable probabilistic reasoning about patient diagnoses.


Common Mistakes

MistakeWhy It's WrongCorrect Approach
Confusing P(A|B) with P(B|A)These are different! P(A|B) ≠ P(B|A) unless P(A) = P(B)Apply Bayes' Theorem: P(A|B) = P(B|A)P(A)/P(B)
Ignoring base ratesA 99% accurate test can still yield mostly false positivesAlways use the Law of Total Probability to find P(evidence)
Assuming independence without verificationJust because two things seem unrelated doesn't mean they areCheck: does P(A∩B) = P(A)·P(B)?
Confusing independence with mutual exclusivityThey are nearly opposite conceptsIndependent: P(A∩B) = P(A)P(B). Mutually exclusive: P(A∩B) = 0
Forgetting to normalizePosterior probabilities must sum to 1Divide by the marginal likelihood P(evidence)
Using multiplication rule incorrectlyP(A∩B) = P(A)P(B) is ONLY for independent eventsUse P(A∩B) = P(A
Overlooking the denominator in Bayes' theoremThe marginal likelihood is easy to forgetP(B) = Σ P(B

Interview Questions

Q1: Explain the difference between P(A|B) and P(B|A). Give a real-world example.

💡Answer

P(A|B) is the probability of A given that B has occurred — it answers "given that B happened, what's the chance of A?" P(B|A) is the probability of B given A — "given A happened, what's the chance of B?"

Real-world example: Let A = "patient has cancer" and B = "test is positive."

  • P(B|A) = P(Positive|Cancer) = 0.95 (sensitivity) — the test's accuracy
  • P(A|B) = P(Cancer|Positive) = ? — what the doctor actually needs to know

These are fundamentally different. A high P(B|A) does NOT guarantee a high P(A|B) when the base rate P(A) is low.

Q2: Two coins are flipped. Are the outcomes independent? How would you test this?

💡Answer

Yes, fair coin flips are independent. We can verify this:

  • P(H₁) = 0.5, P(H₂) = 0.5
  • P(H₁ ∩ H₂) = 0.25 = 0.5 × 0.5 = P(H₁) · P(H₂)

Since P(H₁ ∩ H₂) = P(H₁) · P(H₂), the events are independent.

To test independence empirically: flip two coins many times, record outcomes, and check whether P(both heads) ≈ P(first heads) × P(second heads). Any significant deviation suggests dependence.

Q3: A test has 99% sensitivity and 99% specificity. Disease prevalence is 0.1%. What is P(Disease|Positive)?

💡Answer

P(Disease|Positive) = P(Pos|Disease)·P(Disease) / P(Pos)

P(Pos) = 0.99 × 0.001 + 0.01 × 0.999 = 0.00099 + 0.00999 = 0.01098

P(Disease|Positive) = 0.00099 / 0.01098 ≈ 0.0902 ≈ 9.0%

Even with 99% accuracy on both measures, only about 9% of positive results are true positives when prevalence is 0.1%. This is the base rate fallacy in action.

Q4: Under what conditions is P(A|B) = P(A)? What does this mean?

💡Answer

P(A|B) = P(A) if and only if A and B are independent. This means knowing that B occurred gives no information about the probability of A.

Formally: P(A|B) = P(A∩B)/P(B) = P(A)P(B)/P(B) = P(A) when independent.

In practice: The weather forecast (B) being cloudy is independent of your coin flip (A) being heads. Knowing the weather doesn't change the coin's probability.

Q5: How is Bayes' theorem used in spam filtering?

💡Answer

A Bayesian spam filter computes P(Spam | words) using Bayes' theorem:

P(Spam | w₁, w₂, ..., wₙ) = P(w₁, w₂, ..., wₙ | Spam) · P(Spam) / P(w₁, w₂, ..., wₙ)

Under the naive independence assumption:

P(Spam | words) ∝ P(Spam) · ∏ P(wᵢ | Spam)

The filter maintains probability tables: P(word | Spam) and P(word | Ham) learned from training data. New emails are classified by comparing P(Spam | words) vs P(Ham | words). Words like "free," "viagra," "winner" have high P(word | Spam), while "meeting," "deadline," "project" have high P(word | Ham).

Q6: What is the Monty Hall problem and how does conditional probability explain it?

💡Answer

In the Monty Hall problem: you pick one of three doors. Behind one is a car; behind the others, goats. The host (who knows what's behind the doors) opens a door with a goat and asks if you want to switch.

Should you switch? Yes! Switching gives 2/3 probability of winning; staying gives 1/3.

Conditional probability explanation: When you initially pick a door, P(car behind your door) = 1/3. The host's action of opening a door with a goat is NOT independent — it's constrained by the host's knowledge. After the host opens a goat door, P(car behind remaining door) = 2/3. The host's action provides information that changes the conditional probabilities.


Practice Problems

Problem 1: Drawing Cards

📝Practice Problem 1

Two cards are drawn without replacement from a standard 52-card deck. What is the probability that the second card is a king, given that the first card was a king?

💡Solution

After drawing one king, there are 3 kings left out of 51 cards.

P(King₂ | King₁) = 3/51 = 1/17 ≈ 0.0588

Problem 2: Conditional Probability from a Table

📝Practice Problem 2

A survey of 1000 people yields:

Likes CoffeeDoesn't Like Coffee
Morning Person200100
Not Morning Person300400

What is P(Likes Coffee | Morning Person)?

💡Solution

P(Likes Coffee | Morning Person) = P(Likes Coffee ∩ Morning Person) / P(Morning Person) = 200 / (200 + 100) = 200 / 300 = 2/3 ≈ 0.667

Morning people are more likely to like coffee (66.7%) than the general population (50%).

Problem 3: Bayes' Theorem Application

📝Practice Problem 3

A factory has two machines. Machine A produces 60% of items with a 5% defect rate. Machine B produces 40% with a 10% defect rate. An item is selected at random and found to be defective. What is the probability it came from Machine A?

💡Solution

Let D = defective, A = from Machine A.

P(A|D) = P(D|A) · P(A) / P(D)

P(D) = P(D|A)·P(A) + P(D|B)·P(B) = 0.05 × 0.60 + 0.10 × 0.40 = 0.03 + 0.04 = 0.07

P(A|D) = 0.05 × 0.60 / 0.07 = 0.03 / 0.07 ≈ 0.4286

Despite Machine A having half the defect rate, it accounts for ~42.86% of defective items because it produces more items overall.

Problem 4: Independence Verification

📝Practice Problem 4

A die is rolled once. Let A = {1, 2, 3}, B = {3, 4, 5, 6}, C = {1, 2}. Are A and B independent? Are A and C independent?

💡Solution

P(A) = 3/6 = 1/2, P(B) = 4/6 = 2/3, P(C) = 2/6 = 1/3

A and B: P(A∩B) = P({3}) = 1/6 P(A)·P(B) = (1/2)(2/3) = 1/3 Since 1/6 ≠ 1/3, A and B are NOT independent.

A and C: P(A∩C) = P({1,2}) = 2/6 = 1/3 P(A)·P(C) = (1/2)(1/3) = 1/6 Since 1/3 ≠ 1/6, A and C are NOT independent.

Problem 5: Disease Screening

📝Practice Problem 5

A disease affects 2% of a population. A test for the disease has 90% sensitivity and 85% specificity. What is P(Disease | Positive)? How many people need to be tested to find one true positive?

💡Solution

P(D) = 0.02, P(Pos|D) = 0.90, P(Neg|¬D) = 0.85, P(Pos|¬D) = 0.15

P(Pos) = 0.90 × 0.02 + 0.15 × 0.98 = 0.018 + 0.147 = 0.165

P(D|Pos) = 0.018 / 0.165 ≈ 0.1091 ≈ 10.9%

Number needed to test for one true positive: 1 / P(D|Pos) ≈ 1 / 0.1091 ≈ 9.2

About 9-10 people need to test positive to find one actual case.


Quick Reference

ConceptFormulaKey Insight
Conditional ProbabilityP(A|B) = P(A∩B) / P(B)Restricts sample space to B
Multiplication RuleP(A∩B) = P(A|B)P(B)Decomposes joint probability
Chain RuleP(A₁∩...∩Aₙ) = ∏ P(Aᵢ|A₁...Aᵢ₋₁)Sequential conditioning
IndependenceP(A∩B) = P(A)P(B)Knowing B tells nothing about A
Bayes' TheoremP(A|B) = P(B|A)P(A) / P(B)Updates beliefs with evidence
Law of Total ProbabilityP(A) = Σ P(A|Bᵢ)P(Bᵢ)Decomposes via partition
Mutual ExclusivityP(A∩B) = 0Cannot both occur

Bayesian Update Cycle: Prior → Observe Evidence → Compute Likelihood → Apply Bayes → Posterior → (Posterior becomes new Prior)


Cross-References

  • Basic Probability: See 031-probability-basics.mdx for foundational concepts
  • Probability Distributions — continuous and discrete distributions
  • Joint Probability — joint distributions and marginals
  • Random Variables — random variable fundamentals
  • Combinatorics: See 030-probability-combinatorics.mdx for counting techniques essential to probability calculations
  • Statistics Inference: See 040-statistics-inference.mdx for connecting probability to statistical inference
  • Machine Learning: Conditional probability is fundamental to Naive Bayes classifiers, Bayesian optimization, and probabilistic graphical models
Lesson Progress35 / 100