I’ve been working on https://phrasing.app for a while now, including many iterations of the SRS. It’s been my experience that most of these sorts of improvements are really imperceptible. While I use FSRS as a base, and I’m very happy with the results it provides, it’s really only a few percentage points off of the SM-2 algorithm from the 90s. It’s slightly less stressful, definitely more accurate, but I think only astute users would even notice the difference.
I’ve incorporated many different things into the SRS, from vector embeddings to graph based association to lemma clustering to morpheme linking, and was surprised how much of these I took out.
Most of the unlocks with the SRS have been more in application space. Doing reviews with Anki feels like a chore, and I’m always counting down the reviews left to do. Reviews with Phrasing however are much more addictive, and I routinely spent an extra 30+ minutes in that “ok just one more card” loop.
We will never be able to know with 100% certainty how well you know a card, but FSRS gets us darn close. I think the interesting stuff is less about improving that metric, and more about what can you do with that information.
Thanks to the whole FSRS team btw (I assume y’all will be reading this hn post) <3
There's a lot of UX work to do for SRS. Do you have a sense of how well the ideas behind Humane SRS translate outside of language learning? I imagine the main challenge would be identifying a steady influx of new cards.
I agree that gains in scheduling accuracy are fairly imperceptible for most students. That's why, over the past few years building https://rember.com, we've focused on UX rather than memory models. People who review hundreds of card a day definitely feel the difference, doing 50 fewer reviews per day is liberating. And now that LLMs can generate decent-quality flashcards, people will build larger and larger collections, so scheduler improvements might suddenly become much more important.
Ultimately, though, the biggest advantages is freeing the SRS designer. I'm sure you've grappled with questions like "is the right unit the card, the note, the deck or something else entirely?" or "what happens to the review history if the student edits a card?". You have to consider how review UX, creation/editing flows, and card organization interact. Decoupling the scheduler from these concerns would help a ton.
I would say probably 50% of the learnings from Humane SRS would be applicable in other fields/schedulers. There is another half that is language-specific though - at the end of the day, if you try to learn a language the same way you cram for a med school exam, you're probably not going to succeed. The inverse is also true, please nobody use Phrasing to cram for their med school exam XD
I agree most peoples collections get unwieldy and something needs to be done, so props to Rember! I take the opposite approach - instead of helping people manage large collections, I try to help people get the most out of small collections. This sort of thing is not possible in most fields outside of languages (I don't think — I cannot say I've given it any real thought though).
For example, the standard tier in Phrasing is 40 new Expressions per month. This should result in 2,000-3,500 words in a year, which would be a pretty breakneck pace for most learners, and is considered sufficient for fluency. Of course, users can learn Expressions other users have created for free, or subscribe to higher tiers, or buy credits outright, but it's often not needed.
Indeed Phrasing does not really use the idea of "cards," we reconstruct pseudo-cards based on the morphemes, lemmas, and inflections found within the Expression. So "cards" are indeed not the boundary I use.
This looks incredible, and its obvious that a lot of work has been done, but in exploring it I notice a lot of things that make me hesitate to spend the money!
First, in the section "Expressions are flashcards on steroids", the flavor text on each element (Translations, Audio, etc) is identical.
Next, I look at the pricing and get one idea. Then when I create an account and go to upgrade, I see completely different pricing options. Its not that I care so much about the options, but it kind of worries me!
At one point I swear I saw the phrase "Say something about comprehensible input" instead of an explanation of CI, and the sentence itself was duplicated but now I don't. Maybe you are making this landing page live? It _is_ a nice landing page, to be sure.
Overall, I think it looks really cool and I'm interested in trying it out but just a little nervous at the moment.
What the heck? Thank you for bringing the flavor text issue to my attention. You have no idea how long I spent making sure the copy on each of those to make sure they were unique, fit all screen sizes, etc. I have no idea what happened and I’m tragically upset now XD
The “say something about comprehensible input” was indeed a funny copy issue I found a few weeks ago. edit: found and fixed! original: I thought I fixed it though, there must be a screen size that needs to be updated. I’ll look for it, but it’s a framer website so I can’t grep. Let me know if you find it again!
Indeed I just launched the new page with the new pricing. I have two major tasks this week, the second of which is to update the pricing flow to match the new prices on the home page.
It’s a one man show and fully bootstrapped, so apologies about the disarray. Everything takes a month or two to migrate when you do all the design, marketing, engineering, support, and bug fixes yourself!
EDIT: Both the flavor text and the “say something about ci” have been fixed. The upgrade flow will take a few days. I am planning to grandfather everyone who signs up for the old plan ($10pm) into the new plan ($20pm) at the old price :)
That is an important insight. It is not so much which method gets you to learn more when used for a given amount of time. It is probably more about which method is fun to use, and engages you and thus actually gets used.
Can't help but repeat this old joke: A guy bought a gym-membership for 6 months, and paid $1000. But he was lazy (like most of us are) and never went or very rarely went to the gym, never felt like going there. After 6 months he realized he had wasted $1000. So he thought maybe if he bought the equipment himself he could and would do exercise at home. He bought the equipment for $1000, but then he rarelywent home. Didn't feel like it :-)
To help with language learning I tried Anki, didn't like the UX and ended up writing my own SRS, from scratch.
One thing that becomes very obvious very quickly is that all cards derived from the same piece of information should be treated as a group. The last thing you'd want is to see "a cow / ?" quickly followed by "una mucca / ?". This is just pointless.
So while I appreciate the in-depth write-up by the author, I must say that its main insight - that the scheduling needs to account for the inter-card dependencies - lies right there on the surface. The fact that Anki doesn't support this doesn't make it any less obvious.
I've been thinking about this for a while too as an FSRS developer [1].
In general, we can think of a spaced repetition system as being (i) Content-aware vs. Content-agnostic and (ii) Deck-aware vs. Deck-agnostic
Content-aware systems care about what you're studying (language, medecine, etc) while Content-agnostic systems don't care about what you're studying.
Deck-aware systems consider each card in the context of the rest of the cards (the "deck") while Deck-agnostic systems consider each card in pure isolation.
Currently, FSRS is both Content-agnostic as well as Deck-agnostic. This makes it extremely easy to integrate into a spaced repetition system, but this also means the model will underfit a bit.
It it interesting to note that you could in practice optimize seperate FSRS models for each deck covering different topics, which would make it Content-aware in a sense. Additionally, "fuzz" is a somewhat Deck-aware feature of the model in that it exists specifically to reduce interactions between other cards in the deck.
I ran into a question a while ago that I couldn't find a good answer to, and while it's not exactly on topic this seems like a good place to ask it.
I was working in a detail rich context, where there were a lot of items, about which there were a lot of facts that mostly didn't change but only mostly. Getting a snapshot of these details into approximately everyone's head seemed like a job for spaced repetition, and I considered making a shared Anki deck for the company.
What wasn't clear was how to handle those updates. Just changing the deck in place feels wrong, for those who have been using it - they're remembering right, the cards have changed.
Deprecating cards that are no longer accurate but which don't have replacement information was a related question. It might be worth informing people who have been studying that card that it's wrong now, but there's no reason to surface the deprecation to a person who has never seen the card.
Is there an obvious way to use standard SRS features for this? A less obvious way? A system that provides less standard features? Is this an opportunity for a useful feature for a new or existing system? Or is this actually not an issue for some reason I've missed?
For what you describe, he ideal system would do this:
1. Identify knowledge blocks that you want people to learn. This is what would be tracked with the SRS.
2. Create cards, with a prompt which requires knowledge blocks to answer. Have the answers in this system feed back knowledge to the SRS.
3. When one of the knowledge blocks changes, take the previous knowledge familiarity and count that against the user.
So for example, at some point a card might be "Q. What effect will eating eggs have on blood cholesterol? A. Raise it." That would be broken down into two knowledge blocks: "Cholesterol content of eggs" and "Effect of dietary cholesterol on blood cholesterol".
At some point you might change that card to "Q. What effect will eating eggs have on blood cholesterol? A. None, dietary cholesterol typically doesn't affect blood cholesterol." (Or maybe we're back again on that one.)
The knowledge blocks would be the same, but you'd have to take the existing time studied on the "Effect of dietary cholesterol on blood cholesterol" and mark it against recall rather than towards recall. Someone who'd never studied it would be expected to learn it at a certain pace; but someone who'd studied the old value would be expected to have a harder time -- to have to unlearn the old value.
I think you could probably hack the inputs to the existing FSRS algorithm to simulate that effect -- either by raising the difficulty, or by adding negative views or inputs. But ideally you'd take a trace of people whose knowledge blocks had changed, and account for unlearning specifically.
This specific problem gave us lots of headache while building https://rember.com
We don't have a good solution yet. My hope is that something like content-aware memory models solve the problem at a lower level, so we don't have to worry about it at the product level.
You could have a company-provided Anki account for each user where you add and remove cards just for that user. (I thought you might even be able to use your own server, but that doesn't seem to be an option for the iOS app: https://ankicommunity.github.io/Tutorials/anki_custom_sync_s... )
Then placing a "this has changed" notification card at the front of the new queue only for people who learned the old information is as simple as checking the corresponding card's review status in the database.
Being easy to integrate is an underappreciated feature of FSRS.
Using decks to draw semantic boundaries is likely overly constraining. I think we want to account for finer differences between cards. Decks are coarse and people differ in the ways they use them, some people recommend having just one global deck. Notes are too fine. We explored something in between: a note capturing an idea or concept, plus an associated set of cards. Turns out it's hard to draw idea boundaries. That's why I think it's easier to relate cards by semantic embeddings or more rigid but clearer structures, like the DAG of dependencies suggested elsewhere in this thread.
On the scheduling end, I'm surprised the article didn't mention https://github.com/fasiha/ebisu which uses Bayesian statistics.
When I was studying Japanese, I was thinking how it's always best to learn words in sentences and that it would be good if the sentences for a particular word were random.
Extending that, the sentences could be picked such that the other words are words scheduled for today meaning much more bang for buck per learning hour.
> When I was studying Japanese, I was thinking how it's always best to learn words in sentences and that it would be good if the sentences for a particular word were random.
>Extending that, the sentences could be picked such that the other words are words scheduled for today meaning much more bang for buck per learning hour.
Just the other day I was thinking about how there’s a good chunk of vocab that could be “mined” from the sentences in my vocab deck.
I think that this idea would work well, but would probably require a whole new SRS program to be able to implement it cleanly. It’s too dynamic for a traditional SRS app like Anki which is pretty static in nature.
It has Anki integration or its own SM2 flashcards app (soon FSRS). And it passively collects a personal corpus of sentences from any web/ebook material you open (manga up next).
I plan to add more sophisticated sync between the reading and reviewing such that cards can be more dynamically based on relevant personal corpus content, and where reading (on the web or in books, outside flashcards) would auto-review any flashcards you have (or which you create in the future).
There really shouldn’t be any difference between encountering a word in something you’re reading and reviewing it on a flashcard. And it would be nice to revisit reading material with guidance from FSRS, to find N+1 sentences for learning new words and to find excerpts containing words that are due for review.
I explored memory models for spaced repetition in my master's thesis and later built an SRS product. This post shares my thoughts on content-aware memory models.
I believe this technical shift in how SRS models the student's memory won't just improve scheduling accuracy but, more critically, will unlock better product UX and new types of SRS.
I've got a system for learning languages that does some of the things you mention. The goal is to be able to recommend content for a user to read which combines 1) appropriate level of difficulty 2) usefulness for learning. The idea is to have the SRS system build into the system, so you just sit and read what it gives you, and review of old words and learning new words (according to frequency) happens automatically.
Separating the recall model from the teaching model as you say opens up loads of possibilities.
Brief introduction:
1. Identify "language building blocks" for a language; this includes not just pure vocabulary, but the grammar concepts, inflected forms of words, and can even include graphemes and what-not.
2. For each building block, assign a value -- normally this is the frequency of the building block within the corpus.
3. Get a corpus of selections to study. Tag them with the language building blocks. This is similar to Math Academy's approach, but while they have hundreds of math concepts, I have tens of thousands of building blocks.
3. Use a model to estimate the current difficulty of each word. (I'm using "difficulty" here as the inverse of "retrievability", for reasons that will be clear later.)
4. Estimate the delta of difficulty of each building block after being viewed. Multiply this delta by the word value to get the study value of that word.
5. For each selection, calculate the total difficulty, average difficulty, and total study value. (This is why I use "difficulty" rather than "retrievability", so that I can calculate total cognitive load of a selection.)
Now the teaching algorithm has a lot of things it can do. It can calculate a selection score which balances study value, difficulty, as well as repetitiveness. It can take the word with the highest study value, and then look for words with that word in it. It can take a specific selection that you want to read or listen to, find the most important word in that selection, and then look for things to study which reinforce that word.
You mentioned computational complexity -- calculating all this from scratch certainly takes a lot, but the key thing is that each time you study something, only a handful of things change. This makes it possible to update things very efficiently using an incremental computation [1].
I've been playing with something similar, but far less thought out than what you have.
I have a script for it, but am basically waiting until I can run a powerful enough LLM locally to chug through it with good results.
Basically like the knowledge tree you mention towards the end, but attempt to create a knowledge DAG by asking a LLM "does card (A) imply knowledge of card (B) or vice versa". Then, take that DAG and use it to schedule the cards in a breadth first ordering. So, when reviewing a new deck with a lot of new cards, I'll be sure to get questions like "what was the primary cause of the civil war", before I get questions like "who was the Confederate general who fought at bull run"
What I like about your approach is that it circumvents the data problem. You don't need a dataset with review histories and flashcard content in order to train a model.
In the language learning world there are some great tools already for adding content-awareness.
AnkiMorphs[1] will analyze the morphemes in your sentences and, taking into account the interval of each card as a sign of how well you know each one, will re-order your new cards to, ideally, present you with cards that have only one unknown word.
It doesn't do anything to affect the FSRS directly—it only changes the order of new, unlearned cards—but in my experience it's so effective at shrinking the time from new card to stable/mature that I'm not sure how much more it would help to have the FSRS intervals being adjusted in this particular domain.
I'm building a SRS language learning app [1] so I've thought about this topic a bit, but I've come to a conclusion that srs algorithms might be just a nerd optimization obsession. My app has "stupid" 1,3,7,15,30 or something like that intervals, and the reality is that if I know a card, I can swipe it within 2 seconds, and if I just barely know it, I can spend 30 seconds on it.
So optimizing the algorithm such that every card comes at the exact right moment might cause all cards to feel too hard or too easy. I think having a mix of difficult and easy cards is actually a feature, not a bug.
Don’t fool yourself into thinking a suboptimal SRS is going to be optimal at the motivational aspect. If a user needs a self-confidence boost during a flash card session, this should be a design choice, not due to poor performance of the core SRS algorithm.
Choose your SRS algorithm to best predict what a user knows and when they’re likely to forget it.
If your application decides that it wants to throw some softballs, that’s an application level decision. If you care about psychology and motivation, build a really good algorithm for that. Then blend SRS with motivation as desired.
Thank you for the report! I was thinking of redoing the landing page for a while anyway. Are you using a niche browser or something like that? I haven't had anyone experience this issue nor was I able to reproduce it.
You mention that FSRS treats each card independently, even if they derive from the same note. I wonder whether you've tried this Anki plugin, which tries to increase the interval between reviews of 'sibling' cards: https://ankiweb.net/shared/info/759844606
Since in Anki the "note" is the editing unit, that works for some cloze deletions but not for QA cards (only for double-sided QA cards). A content-aware memory model would allow you to apply "disperse siblings" to any set of cards, independently of whether they were created together in the same editing interface.
| The main challenge in building content-aware memory models is lack of data. To my knowledge, no publicly available dataset exists that contains real-world usage data with both card textual content and review histories.
I wonder if the author has ever considered reaching out to makers of Anki decks used by premeds and medical students like the AnKing [1]. They create Anki decks for users studying the MCAT and various Med School curricula, so have a) relatively stable deck content (which is very well annotated and contains lots of key words that would make semantic grouping quite easy) b) probably contains loads of statistics on user reviews (since they have an Anki addon that sends telemetry to their team to make the decks better IIRC), and c) contains incredibly disparate information (all the way from high-school physics to neurochemistry).
What sort of privacy implications? I'd imagine that Anki data would be relatively privacy-concern free, as it contains no PII, and for the AnKing decks, all of the content is standardized and so wouldn't contain personal notes. Though, having never worked with this data, please let me know if I'm wrong!
Also, having used those decks in the past, and downloaded the add-on/look at the monetization structure of developers like the AnKing, I would be very surprised if aggregate data on review statistics wasn't collected in some way. I.e., if the AnKing is collecting this data already to design better decks/understand which cards are the hardest—probably to target individual support—then I imagine that collecting some de-anonymized version of that data wouldn't be too much of a stretch.
Plus, considering that all of the developers of AnKing-style decks are all doctors, they probably have a pretty good grasp at handling PII and could (hopefully) make pretty sound decisions on whether to give you access :)
You're right, it might work by restricting to just AnKing data. My concern was around other, possibly personal, cards making their way into the dataset.
Rather than relying on an embedding space, my approach is to have the cards themselves be grammars that can define the relationships between concepts explicitly. Then the problem becomes what specific sampling of all the possible outputs is optimal for a learner to see at any given time, given their knowledge state.
This is awesome. I've been using Bunpro for a while, which has great content, but I find myself memorizing the sentences rather than the grammar. Randomly generating cards based on the grammar points and vocab makes a ton of sense.
Some questions / comments / suggestions:
1. Is there a way to import vocab / kanji from Wanikani? WK is quite popular and has a good API. Bunpro integrates nicely with it, where it will or won't show furigana for kanji in the example sentences based on whether you've already learned the word in Wanikani. I'm guessing in your case you'd just want to import all the vocab. Even though I did the placement test, Grsly is still trying to teach me basic vocab like uta and obaasan. This is slowing down my progress through the grammar points.
2. Similar to question 1, is there a way to import grammar progress from Bunpro? Or even just click a button and have it assume I know everything from N5. The placement test only seemed to test a handful of basic grammar points.
3. Some of the sentences it has generated are quite awkward, like "ironna musume" ("all kinds of my daughter"). I guess that's grammatically correct, but it seems pretty unlikely to show up anywhere in real life. Have you considered using a local/small LLM to score or bias the example sentence generation? It's possible to constrain an LLM to only generate output that matches a grammar. You could construct such a grammar for each nontrivial element in your deck, with the vocab currently available for use. I guess you'd have to change the answer in your FAQ if you started using AI.
Amazing work! In https://rember.com the main unit is a note representing a concept or idea, plus some flashcards associated to it, hsrs would fit perfectly! I'll look more deeply into it.
> [....] Ignoring the following factors means we are leaving useful information on the table:
> 1. The review histories of related cards. Card semantics allow us to identify related cards. This enables memory models to account for the review histories of all relevant cards when estimating a specific card’s retrievability.
> 2. [...]
I've been thinking that card semantics shouldn't be analyzed at all, and just treated as a black box. You can get so much data off of just a few users of a flashcard deck that you could build your own map of the relationships between cards, just by noticing the ones that get failed or pass together over time. Just package that map with the deck and the scheduler might get a lot smarter.
That map could give you good info on which cards were redundant, too.
edit: this may be interesting to someone, but I've also been trying to flesh out a model where agents buy questions from a market, trade questions with each other, and make bets with each other about whether the user will be able to recall the question when asked. Bankrupt agents are replaced by new agents. Every incentive in the system is parameterized by the user's learning requirements.
SuperMemo's neural network component (implemented in SM-15) already does something similar by tracking correlations between items without semantic analysis, effectively building that "map" of relationships based purely on performance data.
Yes, that reminds me of knowledge tracing and methods like 1PL-IRT.
I think you can do both and get even better results. The main limitation is that the same flashcards must be studied by multiple students, which doesn't generally apply.
I also love the idea of the market, you could even extend it to evaluate/write high-quality flashcards.
> The main limitation is that the same flashcards must be studied by multiple students, which doesn't generally apply.
I think only a kernel of the same flashcards, because in my mind new cards would quickly find their position after being reviewed a few times, and might displace already well-known cards. I see the process as throwing random cards at students, seeing what's left after shaking the tree, and using that info to teach new students.
The goal, however, would definitely be a single standard but evolving set of cards that described some group of related ideas. I know that's against Supermemo/Anki gospel, but I've gotten an enormous amount of value out of engineered decks such as https://www.asiteaboutnothing.net/w_ultimate_spanish_conjuga....
> I also love the idea of the market, you could even extend it to evaluate/write high-quality flashcards.
It's been my idea to drive conversational spaced repetition with something like this.
I would be valuable for shared decks, like the one you mentioned.
As far as I can tell, the majority of Anki users are medical school students or language learners. Both groups benefit from shared decks. So I think it's a good idea to pursue.
My personal interest is more on conceptual knowledge, like math, cs, history or random blog posts and ideas. It's often the case that, on the same article, different people focus different things, so it would be hard to collect even a small number of reviews on a flashcard you want to study.
I’ve been working on https://phrasing.app for a while now, including many iterations of the SRS. It’s been my experience that most of these sorts of improvements are really imperceptible. While I use FSRS as a base, and I’m very happy with the results it provides, it’s really only a few percentage points off of the SM-2 algorithm from the 90s. It’s slightly less stressful, definitely more accurate, but I think only astute users would even notice the difference.
I’ve incorporated many different things into the SRS, from vector embeddings to graph based association to lemma clustering to morpheme linking, and was surprised how much of these I took out.
Most of the unlocks with the SRS have been more in application space. Doing reviews with Anki feels like a chore, and I’m always counting down the reviews left to do. Reviews with Phrasing however are much more addictive, and I routinely spent an extra 30+ minutes in that “ok just one more card” loop.
We will never be able to know with 100% certainty how well you know a card, but FSRS gets us darn close. I think the interesting stuff is less about improving that metric, and more about what can you do with that information.
Thanks to the whole FSRS team btw (I assume y’all will be reading this hn post) <3
And if anyone is curious I wrote up a bit about my SRS here: https://phrasing.app/blog/humane-srs
Phrasing looks amazing!
There's a lot of UX work to do for SRS. Do you have a sense of how well the ideas behind Humane SRS translate outside of language learning? I imagine the main challenge would be identifying a steady influx of new cards.
I agree that gains in scheduling accuracy are fairly imperceptible for most students. That's why, over the past few years building https://rember.com, we've focused on UX rather than memory models. People who review hundreds of card a day definitely feel the difference, doing 50 fewer reviews per day is liberating. And now that LLMs can generate decent-quality flashcards, people will build larger and larger collections, so scheduler improvements might suddenly become much more important.
Ultimately, though, the biggest advantages is freeing the SRS designer. I'm sure you've grappled with questions like "is the right unit the card, the note, the deck or something else entirely?" or "what happens to the review history if the student edits a card?". You have to consider how review UX, creation/editing flows, and card organization interact. Decoupling the scheduler from these concerns would help a ton.
I would say probably 50% of the learnings from Humane SRS would be applicable in other fields/schedulers. There is another half that is language-specific though - at the end of the day, if you try to learn a language the same way you cram for a med school exam, you're probably not going to succeed. The inverse is also true, please nobody use Phrasing to cram for their med school exam XD
I agree most peoples collections get unwieldy and something needs to be done, so props to Rember! I take the opposite approach - instead of helping people manage large collections, I try to help people get the most out of small collections. This sort of thing is not possible in most fields outside of languages (I don't think — I cannot say I've given it any real thought though).
For example, the standard tier in Phrasing is 40 new Expressions per month. This should result in 2,000-3,500 words in a year, which would be a pretty breakneck pace for most learners, and is considered sufficient for fluency. Of course, users can learn Expressions other users have created for free, or subscribe to higher tiers, or buy credits outright, but it's often not needed.
Indeed Phrasing does not really use the idea of "cards," we reconstruct pseudo-cards based on the morphemes, lemmas, and inflections found within the Expression. So "cards" are indeed not the boundary I use.
Hey it looks awesome!
Just a friendly heads up, I’m on mobile and I noticed that the burger menu doesn’t work.
(iPhone 13)
Otherwise - awesome work
Well… that was new. Fixed now!
Thanks for the report and thanks for the kind words :)
This looks incredible, and its obvious that a lot of work has been done, but in exploring it I notice a lot of things that make me hesitate to spend the money!
First, in the section "Expressions are flashcards on steroids", the flavor text on each element (Translations, Audio, etc) is identical.
Next, I look at the pricing and get one idea. Then when I create an account and go to upgrade, I see completely different pricing options. Its not that I care so much about the options, but it kind of worries me!
At one point I swear I saw the phrase "Say something about comprehensible input" instead of an explanation of CI, and the sentence itself was duplicated but now I don't. Maybe you are making this landing page live? It _is_ a nice landing page, to be sure.
Overall, I think it looks really cool and I'm interested in trying it out but just a little nervous at the moment.
What the heck? Thank you for bringing the flavor text issue to my attention. You have no idea how long I spent making sure the copy on each of those to make sure they were unique, fit all screen sizes, etc. I have no idea what happened and I’m tragically upset now XD
The “say something about comprehensible input” was indeed a funny copy issue I found a few weeks ago. edit: found and fixed! original: I thought I fixed it though, there must be a screen size that needs to be updated. I’ll look for it, but it’s a framer website so I can’t grep. Let me know if you find it again!
Indeed I just launched the new page with the new pricing. I have two major tasks this week, the second of which is to update the pricing flow to match the new prices on the home page.
It’s a one man show and fully bootstrapped, so apologies about the disarray. Everything takes a month or two to migrate when you do all the design, marketing, engineering, support, and bug fixes yourself!
EDIT: Both the flavor text and the “say something about ci” have been fixed. The upgrade flow will take a few days. I am planning to grandfather everyone who signs up for the old plan ($10pm) into the new plan ($20pm) at the old price :)
That is an important insight. It is not so much which method gets you to learn more when used for a given amount of time. It is probably more about which method is fun to use, and engages you and thus actually gets used.
Can't help but repeat this old joke: A guy bought a gym-membership for 6 months, and paid $1000. But he was lazy (like most of us are) and never went or very rarely went to the gym, never felt like going there. After 6 months he realized he had wasted $1000. So he thought maybe if he bought the equipment himself he could and would do exercise at home. He bought the equipment for $1000, but then he rarelywent home. Didn't feel like it :-)
FYI, this looks awful on mobile. (Chrome on a pixel).
I’m assuming the mobiel styles are not loading, you are the second person to mention this. I have reached out to framer: https://x.com/barrelltech/status/1952673597996122608?s=61
Would you minding writing an intro for your app if it is using FSRS?
https://github.com/open-spaced-repetition/awesome-fsrs
PR Submitted!
Seriously, thank you for everything you've done. You've created something truly great :)
To help with language learning I tried Anki, didn't like the UX and ended up writing my own SRS, from scratch.
One thing that becomes very obvious very quickly is that all cards derived from the same piece of information should be treated as a group. The last thing you'd want is to see "a cow / ?" quickly followed by "una mucca / ?". This is just pointless.
So while I appreciate the in-depth write-up by the author, I must say that its main insight - that the scheduling needs to account for the inter-card dependencies - lies right there on the surface. The fact that Anki doesn't support this doesn't make it any less obvious.
I didn’t like the UX either but love everything else about it.
I put together and built a Raycast extension for Anki for this reason: https://www.raycast.com/anton-suprun/anki
Using Raycast keyboard friendly UI makes Anki a lot more fun and friction-less.
If you use a Mac - give it a go.
Apologies if some features are missing - I’ve been procrastinating on patching it with some requested features
Yeah, sorry, I don't like yours either :)
No problem :)
Side note: if anyone else is reading this and likes it - contributors are welcome.
There’s a lot of improvements to be made and I could use the help as I’m quite busy with a few other projects at the moment
I've been thinking about this for a while too as an FSRS developer [1].
In general, we can think of a spaced repetition system as being (i) Content-aware vs. Content-agnostic and (ii) Deck-aware vs. Deck-agnostic
Content-aware systems care about what you're studying (language, medecine, etc) while Content-agnostic systems don't care about what you're studying.
Deck-aware systems consider each card in the context of the rest of the cards (the "deck") while Deck-agnostic systems consider each card in pure isolation.
Currently, FSRS is both Content-agnostic as well as Deck-agnostic. This makes it extremely easy to integrate into a spaced repetition system, but this also means the model will underfit a bit.
It it interesting to note that you could in practice optimize seperate FSRS models for each deck covering different topics, which would make it Content-aware in a sense. Additionally, "fuzz" is a somewhat Deck-aware feature of the model in that it exists specifically to reduce interactions between other cards in the deck.
[1] https://github.com/open-spaced-repetition/py-fsrs
I ran into a question a while ago that I couldn't find a good answer to, and while it's not exactly on topic this seems like a good place to ask it.
I was working in a detail rich context, where there were a lot of items, about which there were a lot of facts that mostly didn't change but only mostly. Getting a snapshot of these details into approximately everyone's head seemed like a job for spaced repetition, and I considered making a shared Anki deck for the company.
What wasn't clear was how to handle those updates. Just changing the deck in place feels wrong, for those who have been using it - they're remembering right, the cards have changed.
Deprecating cards that are no longer accurate but which don't have replacement information was a related question. It might be worth informing people who have been studying that card that it's wrong now, but there's no reason to surface the deprecation to a person who has never seen the card.
Is there an obvious way to use standard SRS features for this? A less obvious way? A system that provides less standard features? Is this an opportunity for a useful feature for a new or existing system? Or is this actually not an issue for some reason I've missed?
For what you describe, he ideal system would do this:
1. Identify knowledge blocks that you want people to learn. This is what would be tracked with the SRS.
2. Create cards, with a prompt which requires knowledge blocks to answer. Have the answers in this system feed back knowledge to the SRS.
3. When one of the knowledge blocks changes, take the previous knowledge familiarity and count that against the user.
So for example, at some point a card might be "Q. What effect will eating eggs have on blood cholesterol? A. Raise it." That would be broken down into two knowledge blocks: "Cholesterol content of eggs" and "Effect of dietary cholesterol on blood cholesterol".
At some point you might change that card to "Q. What effect will eating eggs have on blood cholesterol? A. None, dietary cholesterol typically doesn't affect blood cholesterol." (Or maybe we're back again on that one.)
The knowledge blocks would be the same, but you'd have to take the existing time studied on the "Effect of dietary cholesterol on blood cholesterol" and mark it against recall rather than towards recall. Someone who'd never studied it would be expected to learn it at a certain pace; but someone who'd studied the old value would be expected to have a harder time -- to have to unlearn the old value.
I think you could probably hack the inputs to the existing FSRS algorithm to simulate that effect -- either by raising the difficulty, or by adding negative views or inputs. But ideally you'd take a trace of people whose knowledge blocks had changed, and account for unlearning specifically.
This specific problem gave us lots of headache while building https://rember.com We don't have a good solution yet. My hope is that something like content-aware memory models solve the problem at a lower level, so we don't have to worry about it at the product level.
You could have a company-provided Anki account for each user where you add and remove cards just for that user. (I thought you might even be able to use your own server, but that doesn't seem to be an option for the iOS app: https://ankicommunity.github.io/Tutorials/anki_custom_sync_s... )
Then placing a "this has changed" notification card at the front of the new queue only for people who learned the old information is as simple as checking the corresponding card's review status in the database.
Being easy to integrate is an underappreciated feature of FSRS.
Using decks to draw semantic boundaries is likely overly constraining. I think we want to account for finer differences between cards. Decks are coarse and people differ in the ways they use them, some people recommend having just one global deck. Notes are too fine. We explored something in between: a note capturing an idea or concept, plus an associated set of cards. Turns out it's hard to draw idea boundaries. That's why I think it's easier to relate cards by semantic embeddings or more rigid but clearer structures, like the DAG of dependencies suggested elsewhere in this thread.
On the scheduling end, I'm surprised the article didn't mention https://github.com/fasiha/ebisu which uses Bayesian statistics.
When I was studying Japanese, I was thinking how it's always best to learn words in sentences and that it would be good if the sentences for a particular word were random.
Extending that, the sentences could be picked such that the other words are words scheduled for today meaning much more bang for buck per learning hour.
Ebisu v2 doesn't perform well in our SRS Benchmark: https://github.com/open-spaced-repetition/srs-benchmark#:~:t...
> When I was studying Japanese, I was thinking how it's always best to learn words in sentences and that it would be good if the sentences for a particular word were random.
>Extending that, the sentences could be picked such that the other words are words scheduled for today meaning much more bang for buck per learning hour.
Just the other day I was thinking about how there’s a good chunk of vocab that could be “mined” from the sentences in my vocab deck.
I think that this idea would work well, but would probably require a whole new SRS program to be able to implement it cleanly. It’s too dynamic for a traditional SRS app like Anki which is pretty static in nature.
I’m working on this with Manabi Reader for Japanese
https://reader.manabi.io
It has Anki integration or its own SM2 flashcards app (soon FSRS). And it passively collects a personal corpus of sentences from any web/ebook material you open (manga up next).
I plan to add more sophisticated sync between the reading and reviewing such that cards can be more dynamically based on relevant personal corpus content, and where reading (on the web or in books, outside flashcards) would auto-review any flashcards you have (or which you create in the future).
There really shouldn’t be any difference between encountering a word in something you’re reading and reviewing it on a flashcard. And it would be nice to revisit reading material with guidance from FSRS, to find N+1 sentences for learning new words and to find excerpts containing words that are due for review.
I explored memory models for spaced repetition in my master's thesis and later built an SRS product. This post shares my thoughts on content-aware memory models.
I believe this technical shift in how SRS models the student's memory won't just improve scheduling accuracy but, more critically, will unlock better product UX and new types of SRS.
Thanks for the write-up!
I've got a system for learning languages that does some of the things you mention. The goal is to be able to recommend content for a user to read which combines 1) appropriate level of difficulty 2) usefulness for learning. The idea is to have the SRS system build into the system, so you just sit and read what it gives you, and review of old words and learning new words (according to frequency) happens automatically.
Separating the recall model from the teaching model as you say opens up loads of possibilities.
Brief introduction:
1. Identify "language building blocks" for a language; this includes not just pure vocabulary, but the grammar concepts, inflected forms of words, and can even include graphemes and what-not.
2. For each building block, assign a value -- normally this is the frequency of the building block within the corpus.
3. Get a corpus of selections to study. Tag them with the language building blocks. This is similar to Math Academy's approach, but while they have hundreds of math concepts, I have tens of thousands of building blocks.
3. Use a model to estimate the current difficulty of each word. (I'm using "difficulty" here as the inverse of "retrievability", for reasons that will be clear later.)
4. Estimate the delta of difficulty of each building block after being viewed. Multiply this delta by the word value to get the study value of that word.
5. For each selection, calculate the total difficulty, average difficulty, and total study value. (This is why I use "difficulty" rather than "retrievability", so that I can calculate total cognitive load of a selection.)
Now the teaching algorithm has a lot of things it can do. It can calculate a selection score which balances study value, difficulty, as well as repetitiveness. It can take the word with the highest study value, and then look for words with that word in it. It can take a specific selection that you want to read or listen to, find the most important word in that selection, and then look for things to study which reinforce that word.
You mentioned computational complexity -- calculating all this from scratch certainly takes a lot, but the key thing is that each time you study something, only a handful of things change. This makes it possible to update things very efficiently using an incremental computation [1].
But that does make the code quite complicated.
[1] https://en.wikipedia.org/wiki/Incremental_computing
Interesting, I've been surprised to see how many language learning apps already include some of the ideas I've discussed in the blog post!
How far along are you in developing the system?
I've been playing with something similar, but far less thought out than what you have.
I have a script for it, but am basically waiting until I can run a powerful enough LLM locally to chug through it with good results.
Basically like the knowledge tree you mention towards the end, but attempt to create a knowledge DAG by asking a LLM "does card (A) imply knowledge of card (B) or vice versa". Then, take that DAG and use it to schedule the cards in a breadth first ordering. So, when reviewing a new deck with a lot of new cards, I'll be sure to get questions like "what was the primary cause of the civil war", before I get questions like "who was the Confederate general who fought at bull run"
I'd love to see it.
What I like about your approach is that it circumvents the data problem. You don't need a dataset with review histories and flashcard content in order to train a model.
Andy also tested this idea. You can read his notes here:
GPT-4 can probably estimate whether two flashcards are functionally equivalent
https://notes.andymatuschak.org/zJ7PMGzjcgBUoPjLUHBF9jn
GPT-4 can probably estimate whether one prompt will spoil retrieval of another
https://notes.andymatuschak.org/zK9Y15pCnRMLoxUahLCzdyc
In the language learning world there are some great tools already for adding content-awareness.
AnkiMorphs[1] will analyze the morphemes in your sentences and, taking into account the interval of each card as a sign of how well you know each one, will re-order your new cards to, ideally, present you with cards that have only one unknown word.
It doesn't do anything to affect the FSRS directly—it only changes the order of new, unlearned cards—but in my experience it's so effective at shrinking the time from new card to stable/mature that I'm not sure how much more it would help to have the FSRS intervals being adjusted in this particular domain.
1: https://mortii.github.io/anki-morphs/intro.html
Reading the thread, I definitely overlooked language learning solutions. Thanks for sharing!
I'm building a SRS language learning app [1] so I've thought about this topic a bit, but I've come to a conclusion that srs algorithms might be just a nerd optimization obsession. My app has "stupid" 1,3,7,15,30 or something like that intervals, and the reality is that if I know a card, I can swipe it within 2 seconds, and if I just barely know it, I can spend 30 seconds on it.
So optimizing the algorithm such that every card comes at the exact right moment might cause all cards to feel too hard or too easy. I think having a mix of difficult and easy cards is actually a feature, not a bug.
[1] https://vocabuo.com
Don’t fool yourself into thinking a suboptimal SRS is going to be optimal at the motivational aspect. If a user needs a self-confidence boost during a flash card session, this should be a design choice, not due to poor performance of the core SRS algorithm.
Choose your SRS algorithm to best predict what a user knows and when they’re likely to forget it.
If your application decides that it wants to throw some softballs, that’s an application level decision. If you care about psychology and motivation, build a really good algorithm for that. Then blend SRS with motivation as desired.
This site makes my (more than good) computer's browser crawl to a halt.
Thank you for the report! I was thinking of redoing the landing page for a while anyway. Are you using a niche browser or something like that? I haven't had anyone experience this issue nor was I able to reproduce it.
Firefox Dev Edition, nothing odd imo.
You mention that FSRS treats each card independently, even if they derive from the same note. I wonder whether you've tried this Anki plugin, which tries to increase the interval between reviews of 'sibling' cards: https://ankiweb.net/shared/info/759844606
Ah, I totally missed this, thanks for sharing it.
Since in Anki the "note" is the editing unit, that works for some cloze deletions but not for QA cards (only for double-sided QA cards). A content-aware memory model would allow you to apply "disperse siblings" to any set of cards, independently of whether they were created together in the same editing interface.
| The main challenge in building content-aware memory models is lack of data. To my knowledge, no publicly available dataset exists that contains real-world usage data with both card textual content and review histories.
I wonder if the author has ever considered reaching out to makers of Anki decks used by premeds and medical students like the AnKing [1]. They create Anki decks for users studying the MCAT and various Med School curricula, so have a) relatively stable deck content (which is very well annotated and contains lots of key words that would make semantic grouping quite easy) b) probably contains loads of statistics on user reviews (since they have an Anki addon that sends telemetry to their team to make the decks better IIRC), and c) contains incredibly disparate information (all the way from high-school physics to neurochemistry).
---
[1]: https://www.theanking.com
It would be awesome to work on that data. I'm afraid of the privacy implications though.
What sort of privacy implications? I'd imagine that Anki data would be relatively privacy-concern free, as it contains no PII, and for the AnKing decks, all of the content is standardized and so wouldn't contain personal notes. Though, having never worked with this data, please let me know if I'm wrong!
Also, having used those decks in the past, and downloaded the add-on/look at the monetization structure of developers like the AnKing, I would be very surprised if aggregate data on review statistics wasn't collected in some way. I.e., if the AnKing is collecting this data already to design better decks/understand which cards are the hardest—probably to target individual support—then I imagine that collecting some de-anonymized version of that data wouldn't be too much of a stretch.
Plus, considering that all of the developers of AnKing-style decks are all doctors, they probably have a pretty good grasp at handling PII and could (hopefully) make pretty sound decisions on whether to give you access :)
You're right, it might work by restricting to just AnKing data. My concern was around other, possibly personal, cards making their way into the dataset.
Rather than relying on an embedding space, my approach is to have the cards themselves be grammars that can define the relationships between concepts explicitly. Then the problem becomes what specific sampling of all the possible outputs is optimal for a learner to see at any given time, given their knowledge state.
See how it's applied to Japanese learning here: https://elldev.com/feed/grsly
This is awesome. I've been using Bunpro for a while, which has great content, but I find myself memorizing the sentences rather than the grammar. Randomly generating cards based on the grammar points and vocab makes a ton of sense.
Some questions / comments / suggestions:
1. Is there a way to import vocab / kanji from Wanikani? WK is quite popular and has a good API. Bunpro integrates nicely with it, where it will or won't show furigana for kanji in the example sentences based on whether you've already learned the word in Wanikani. I'm guessing in your case you'd just want to import all the vocab. Even though I did the placement test, Grsly is still trying to teach me basic vocab like uta and obaasan. This is slowing down my progress through the grammar points.
2. Similar to question 1, is there a way to import grammar progress from Bunpro? Or even just click a button and have it assume I know everything from N5. The placement test only seemed to test a handful of basic grammar points.
3. Some of the sentences it has generated are quite awkward, like "ironna musume" ("all kinds of my daughter"). I guess that's grammatically correct, but it seems pretty unlikely to show up anywhere in real life. Have you considered using a local/small LLM to score or bias the example sentence generation? It's possible to constrain an LLM to only generate output that matches a grammar. You could construct such a grammar for each nontrivial element in your deck, with the vocab currently available for use. I guess you'd have to change the answer in your FAQ if you started using AI.
Amazing work! In https://rember.com the main unit is a note representing a concept or idea, plus some flashcards associated to it, hsrs would fit perfectly! I'll look more deeply into it.
After reading this, I would really like to know what other spaced repetition software there is for things like ai driven speech?
I love Anki and used it before when I needed to memorize things, but would love to know what other options on the market exist.
> [....] Ignoring the following factors means we are leaving useful information on the table:
> 1. The review histories of related cards. Card semantics allow us to identify related cards. This enables memory models to account for the review histories of all relevant cards when estimating a specific card’s retrievability.
> 2. [...]
I've been thinking that card semantics shouldn't be analyzed at all, and just treated as a black box. You can get so much data off of just a few users of a flashcard deck that you could build your own map of the relationships between cards, just by noticing the ones that get failed or pass together over time. Just package that map with the deck and the scheduler might get a lot smarter.
That map could give you good info on which cards were redundant, too.
edit: this may be interesting to someone, but I've also been trying to flesh out a model where agents buy questions from a market, trade questions with each other, and make bets with each other about whether the user will be able to recall the question when asked. Bankrupt agents are replaced by new agents. Every incentive in the system is parameterized by the user's learning requirements.
SuperMemo's neural network component (implemented in SM-15) already does something similar by tracking correlations between items without semantic analysis, effectively building that "map" of relationships based purely on performance data.
Yes, that reminds me of knowledge tracing and methods like 1PL-IRT.
I think you can do both and get even better results. The main limitation is that the same flashcards must be studied by multiple students, which doesn't generally apply.
I also love the idea of the market, you could even extend it to evaluate/write high-quality flashcards.
> The main limitation is that the same flashcards must be studied by multiple students, which doesn't generally apply.
I think only a kernel of the same flashcards, because in my mind new cards would quickly find their position after being reviewed a few times, and might displace already well-known cards. I see the process as throwing random cards at students, seeing what's left after shaking the tree, and using that info to teach new students.
The goal, however, would definitely be a single standard but evolving set of cards that described some group of related ideas. I know that's against Supermemo/Anki gospel, but I've gotten an enormous amount of value out of engineered decks such as https://www.asiteaboutnothing.net/w_ultimate_spanish_conjuga....
> I also love the idea of the market, you could even extend it to evaluate/write high-quality flashcards.
It's been my idea to drive conversational spaced repetition with something like this.
I would be valuable for shared decks, like the one you mentioned. As far as I can tell, the majority of Anki users are medical school students or language learners. Both groups benefit from shared decks. So I think it's a good idea to pursue.
My personal interest is more on conceptual knowledge, like math, cs, history or random blog posts and ideas. It's often the case that, on the same article, different people focus different things, so it would be hard to collect even a small number of reviews on a flashcard you want to study.