Blog → Spaced Repetition Research

Research

Spaced Repetition in Medical Education: What 10 Years of Research Shows

April 6, 20269 min readBy Dante

Every medical student has heard the advice: “Use Anki.” It shows up in Reddit threads, Discord servers, and upperclassman group chats. But the advice is usually vague. Start early. Do your reviews. Use the AnKing deck. Few students hear exactly how much spaced repetition matters, what the optimal patterns look like, or where the method breaks down.

The published literature on spaced repetition in medical education has grown substantially since 2006. Here is what it actually says, stripped of anecdote and internet folklore.

The Foundation: Testing Beats Re-Reading

Before discussing spaced repetition specifically, you need the principle it rests on. Roediger and Karpicke (2006) published one of the most cited studies in educational psychology in Psychological Science (PubMed 16507066). They compared students who re-read material to students who tested themselves on it, then measured retention at 5 minutes, 2 days, and 1 week.

Key Finding

At the 5-minute mark, re-reading produced better recall. But by one week, the testing group retained significantly more material. The re-reading group's advantage evaporated completely, then reversed.

The more troubling finding: students in the re-reading group reported feeling more confident about their knowledge. They believed they knew the material better. They were wrong. This is the core problem with passive review. It generates familiarity, not retrieval strength. Familiarity feels like understanding. On exam day, it is not.

This “testing effect” is now one of the most replicated findings in cognitive psychology. It applies across ages, subject matter, and testing formats. It is the reason flashcard-based review works at all. Every time you see a card and attempt to recall the answer before flipping it, you are running a retrieval trial. That trial is what strengthens the memory trace.

The Spacing Effect: Why Timing Matters More Than Volume

The testing effect tells you what to do (retrieve, not re-read). The spacing effect tells you when. Hermann Ebbinghaus described it in the 1880s: memories strengthened by spaced review intervals decay more slowly than memories crammed in a single session.

Modern spaced repetition software like Anki operationalizes this with an algorithm. When you answer a card correctly, the interval before you see it again increases. When you answer incorrectly, the interval resets. Over time, well-learned cards appear less frequently while weak cards appear more often. The system prioritizes the material you are most likely to forget.

This is not a small effect. A previous analysis of study schedule research on this blog found that practice question volume (r=0.53) predicted Step 1 scores far better than hours studied (r=0.07). Spaced repetition is one mechanism that explains this gap: it is a method that maximizes retrieval events per hour of study time.

12.9% Higher CBSE Scores: The Gilbert 2023 Data

Gilbert et al. (2023) published a cohort study in Medical Science Educator (PMC10403443) that tracked 130 first-year students at Boonshoft School of Medicine. Seventy-eight students used Anki for at least one exam. Fifty-two did not use Anki at all. The results were consistent across every assessment.

Exam Score Differences (Anki Users vs. Non-Users)

  • Course I (Biochemistry/Pathology): +6.4% (p < 0.001)
  • Course II (Immunology/Microbiology): +6.2% (p = 0.002)
  • Course III (Cardio/Renal/Pulm): +7.0% (p = 0.002)
  • CBSE (end-of-year comprehensive): +12.9% (p = 0.003)

Two details from this study deserve attention. First, the researchers controlled for MCAT scores. Anki users did have higher baseline MCAT percentiles (73 vs. 65), but the exam score differences remained statistically significant after adjustment. Second, the largest gap appeared on the CBSE, the exam most similar to Step 1. This makes sense. The CBSE tests material from the entire year. Spaced repetition is specifically designed to maintain long-term retention across a broad knowledge base. A course exam six weeks after a module is a short-interval test. The CBSE is a long-interval test. That is exactly where spaced repetition should produce its biggest advantage, and it did.

Students who reported high dependency on Anki for studying also scored significantly higher than those with low or medium dependency. This was not a casual-use effect. The students who committed to the system as their primary study method saw the largest gains.

565 Cards Per Day: The Mehta 2023 Usage Patterns

Most studies on Anki in medical education rely on self-reported usage. Mehta et al. (2023) did something different. They extracted data directly from students' Anki files at Carle Illinois College of Medicine, then compared usage patterns between above-median and below-median exam performers (PMC10597963).

Above-Median vs. Below-Median Performers

  • Daily card reviews: 565 vs. 389
  • Total days of Anki use: 248 vs. 193
  • Higher-performing students started Anki earlier in the year

The volume gap is real. 565 cards per day is roughly 45% more than 389. But the duration gap may be more important. Students who used Anki for 248 days had 55 additional days of spaced review compared to those who used it for 193 days. That is nearly two extra months of intervals accumulating. Cards that hit 30-day, 60-day, and 90-day intervals in the early adopters were still at 10-day or 20-day intervals for the later starters.

This finding has a direct practical implication that most students ignore: starting Anki three weeks earlier may matter more than adding 100 extra cards per day to a late start. The algorithm needs time to space out intervals. Cramming 800 cards per day for four weeks does not replicate the benefit of 400 cards per day for eight weeks.

Beyond Step 1: Where Spaced Repetition Falls Short

The research is not uniformly positive. Wothe et al. (2023) surveyed 165 students at the University of Minnesota and found that daily Anki use correlated with higher Step 1 scores (p = 0.039) but showed no significant correlation with Step 2 CK scores (PMC10176558).

This makes sense when you think about what each exam tests. Step 1 is heavily weighted toward factual recall: biochemical pathways, pharmacology mechanisms, microbiology. These are exactly the types of knowledge that flashcards capture well. Step 2 CK emphasizes clinical reasoning, diagnosis, and management. You cannot put “What is the next best step for this patient with these five findings?” into a flashcard and expect it to build the same reasoning skill that a practice vignette does.

Gilbert et al. also noted that Anki's benefit was strongest for Course II (Immunology/Microbiology), which involves heavy memorization of organisms and drug names. For Course III (Cardio/Renal/Pulm), which requires more physiological reasoning, the correlation between individual Anki statistics and exam scores was weaker.

The pattern across studies is consistent: spaced repetition excels at discrete fact retention. It is less effective for building the kind of integrative clinical reasoning that higher-order exams demand. This does not mean you should stop using Anki for Step 2. It means you should not rely on Anki alone. Practice questions with full vignettes are where clinical reasoning develops. The research on test-taking errors reinforces this: knowing the facts and applying them under exam conditions are separate skills.

An Unexpected Finding: Sleep Quality

The Wothe et al. study produced an unexpected secondary finding. Daily Anki users reported significantly better sleep quality (p = 0.01) than non-users, with no difference in perceived stress, burnout risk, or extracurricular involvement.

The study was cross-sectional, so causation is not established. But one plausible explanation: students using spaced repetition have a structured daily review obligation that may reduce the late-night anxiety-driven cramming sessions common before exams. When you know your algorithm is handling your review schedule, there is less reason to stay up until 2 AM re-reading First Aid before a test. This is speculative, but the correlation was statistically significant and worth noting for students concerned about sustainability during dedicated study.

Common Mistakes the Research Highlights

Across these studies, several patterns emerge among students who use spaced repetition but do not get the full benefit:

  1. Starting too late. Mehta et al. showed that above-median performers began Anki earlier in the year. The algorithm needs months to build long intervals. Starting Anki during dedicated study means you are cramming with flashcards rather than spacing with them.
  2. Hitting “Good” too quickly.Anki's algorithm extends intervals based on your response. If you mark a card “Good” when you only had a vague sense of the answer, you are telling the algorithm you know it. You will not see it again for weeks. When it reappears, you will have forgotten it. Honest self-grading is what makes spaced repetition work.
  3. Skipping days and letting reviews pile up. The Gilbert study found that current streak and longest streak correlated with Course I exam scores. Consistency matters. Missing three days can create a backlog of hundreds of cards that breaks the spacing schedule.
  4. Using Anki as the only study method. The research consistently shows spaced repetition improves fact-based recall. It does not replace practice questions for building clinical reasoning. The students who perform best pair Anki for retention with question banks for application.
  5. Making your own cards from scratch. The AnKing deck contains over 30,000 cards mapped to First Aid and Pathoma. Building your own deck from zero takes hundreds of hours that could be spent reviewing. Gilbert et al. noted that the AnKing deck was the most-used resource among their study participants. Unless you have a specific gap the community decks do not cover, start with what exists.

Practical Targets from the Data

Based on the published findings, here is what evidence-based spaced repetition use looks like for Step 1 preparation:

  • Start date: First semester of M1, or as early as curriculum allows
  • Daily volume: 400-600 cards/day (above-median performers averaged 565)
  • Deck choice: AnKing or equivalent community-maintained deck tagged to your curriculum
  • Retention target:85-90% mature card retention (Anki's built-in stats track this)
  • Consistency: Daily reviews without exception, including weekends
  • Pair with: UWorld or equivalent question bank for clinical reasoning

These are not aspirational numbers. They are the patterns that separated above-median from below-median performers in the Mehta data.

The Bottom Line

Ten years of research on spaced repetition in medical education points in one direction. Students who use it score higher on basic science exams. Students who start earlier and maintain higher volume outperform those who start late. The effect is largest on cumulative exams like the CBSE, which most closely approximates Step 1. The method works best for factual recall and is less effective for clinical reasoning, meaning it should be paired with question-based practice rather than used in isolation.

None of this is controversial in the literature. The studies agree. The disagreement is in execution: most students know they should use spaced repetition but struggle with the daily discipline, honest self-grading, and early start required to get the full effect.

References

  • Roediger HL, Karpicke JD. “Test-enhanced learning: taking memory tests improves long-term retention.” Psychological Science, 2006; 17(3):249-255. DOI: 10.1111/j.1467-9280.2006.01693.x
  • Gilbert DM et al. “A Cohort Study Assessing the Impact of Anki as a Spaced Repetition Tool on Academic Performance in Medical School.” Medical Science Educator, 2023. PMC10403443. DOI: 10.1007/s40670-023-01826-8
  • Mehta A et al. “Implementation of Spaced Repetition by First-Year Medical Students: a Retrospective Comparison Based on Summative Exam Performance.” Medical Science Educator, 2023; 33(5):1089-1094. PMC10597963. DOI: 10.1007/s40670-023-01839-3
  • Wothe JK et al. “Academic and Wellness Outcomes Associated with use of Anki Spaced Repetition Software in Medical School.” Journal of Medical Education and Curricular Development, 2023; 10. PMC10176558. DOI: 10.1177/23821205231173289
  • Magro J et al. “Anki flashcards: Spaced repetition learning in the undergraduate medical pharmacology curriculum.” The Clinical Teacher, 2024; 21(6):e13798. DOI: 10.1111/tct.13798

Need help building your spaced repetition system?

Book a free diagnostic session. We'll review your current Anki setup, daily volume, and retention stats, then build a plan that fits your timeline and target score.

Book a Free Diagnostic Session