Martigeon's Notes:

Memorizing Spanish words

How and why I created a desktop app to cram Spanish vocabulary as fast as possible.

While I was writing my thesis, I stayed with my parents for a few months, and – having a Peruvian mother – decided to relearn Spanish by speaking it with her every day. I used as many tools as available to speed up the process, but felt that a lot of the vocabulary memorization methods fell short. I ended up creating my own desktop app called CAS 5000.

Existing methods

Of course, I first tried out most of the popular and easily available apps. The most popular of them all is Duolingo, it professes to be an effective method:

A study has shown that 34 hours of Duolingo are equal to 1 university semester of language courses.

They even have a page dedicated to peer-reviewed research. And of course, they are proud of the fact that their service is entirely free. I tried it out, and surprisingly, did not like it at all. Having such a wide userbase and being the most downloaded language app, why didn’t I like it?

Everyone learns differently, and for my situation, it was not effective. First of all, both grammar and vocabulary are taught, and grammar is done poorly, because they do not actually teach rules, they mostly give you examples of grammatical sentences and require to to extrapolate. This is actually a known language teaching method, and seems logical because that is how infants learn their native languages. But it’s not for everyone. I had already found a great (paid) internet service for grammar on Fluencia.com, and was mostly looking for something that could accelerate my vocabulary. So I turned to another well-known app.

Memrise is an app which is more directly fine-tuned for memorization of literally anything. Vocabulary memorization is a popular trajectory, but you could also use the app for learning the capitals of countries, for example. I tried their spanish course, and yet again, I found it disappointing. And the reason why is because my goal was to learn Spanish vocabulary as fast as possible. I had limited time with my mother, and wanted to learn as much vocabulary as possible in as little time. Memrise sometimes has questions in audio format, which requires you to play audio files in order to select the right one; and other questions have to be typed on a touchscreen. The feedback loop of each question was too long. I needed something in which answering each question took less than a second.

One of my favorite websites is Sporcle.com. It’s especially useful whenever I want to procrastinate but still feel like I’m being somewhat productive. It’s a simple website with a plethora of free quizzes in all sorts of fields. And sure enough, there were already quite a few Spanish vocabulary exercises.

Sporcle’s simple quiz format.

Sporcle’s simple quiz format.

The way that it works is that you are given a list of words in English, and you have to type them out in Spanish. Provided you can touch-type, answering each question can take as little as half a second, which really speeds the whole process up. Also, I genuinely think it’s more fun this way. Still, the choice of quizzes are limited by small categories, and you run through most of them in a few days. So I decided to improve on the sporcle quiz format by designing an app with a larger vocabulary size.

Data collection

The hardest part about creating this app was categorizing the words. For this to work, it was necessary to get accurate Spanish-English word pairs in an easily parsable format, I wanted to avoid having to scrape through webpages. Lucky for me there already was a good source available through users of a memory app called Anki. It’s a free application that uses virtual flashcards as a memorization tool — similar in spirit to what I want to do, but different in implementation. One user submitted an ordered list of 5000 most used Spanish words, along with their English translations! The raw data came with a lot of HTML baggage that I didn’t need, so I used a Jupyter notebook to clean things up:

  1. Import tab-delimited data with numpy.
  2. Sort entries according to word frequency.
  3. Delete any verb/preposition hints from Spanish column and trim whitespace.
  4. Import to Excel to manually fix any entries.

Initially I thought it would be cool to create semantic categories (e.g. numbers, animals, professions, etc.) but this would have taken too much time, so I decided instead it would be best to just create 50 levels, each of 100 words, in order of decreasing usage in the Spanish language.

User interface

Development of the interface was made spectacularly easy with the help of Qt Creator; GUI’s are notoriously annoying to program. With some clicking and dragging I soon had a window which looked similar enough to Sporcle’s quiz format. Qt Creator then exports it to a python file which you can use to link the actual logic of your program.

Screenshot of my desktop app, designed with Qt Creator and interfaced with PyQt5.

Screenshot of my desktop app, designed with Qt Creator and interfaced with PyQt5.

One way that it differs from Sporcle is that you are forced to write the translation for only the current word, rather than automatically matching any word. You’re able to skip through answers though, just to keep momentum going. And each time the level is replayed, the list shuffles to make sure that you don’t memorize sequences of words, as that’s unlikely to happen whenever you’re in an actual Spanish conversation.

All-in-all, I’m happy with the app, and it really helped me out with memorizing as much as possible, as quickly as possible. And in case you’re curious: no, I did not actually work through all levels – I stopped around level 12. Keep in mind though that this works best if you supplement it with verbal practice. On its own, you’re not likely to internalize much, but I think that’s true for any app – there’s no substitute for actually speaking it!

I named it CAS 5000, as in 5000-words-in-CAStellano, and is a play on HAL 9000. In case you want to play around with it, here is its Github repository.