Chen Li


Machine Learning Notes: A Hackers' Guide to Language Models

Jeremy Howard put a video on YouTube about LLMs: A Hackers’ Guide to Language Models - YouTube, and the GitHub link is lm-hackers. I would watch it again if I were to do the research on LLMs.

The level of detail in this video is amazing. And by “the level of detail”, I mean

  • Python code that teaches ChatGPT to run a Python function. Which is powerful, quite handy and embarrassingly simple1.

  • Links to GitHub repositories, websites, etc. They are often the most useful, because they are sources you will go to when coding, or from these sources you can expand to other sources. Thus they are really helpful if you want to actually do the work instead of having a vague understanding.

    When coding, referring to certain links is a habit that may take some time to grow. I used to have the impression that coding is constantly hitting the keyboard (probably from Black Books S02E03), but what really happened to me is that I spend 70% of time searching and reading other people’s code (referring to certain links), 20% writing and testing, 10% debugging. And by showing these links, he reduces the time of searching and reading for the viewers.

In a sense, they are more close to experience than knowledge. I’ll try to do the same thing, try to.

By the way, it’s astonishing to see how Machine Learning community (GitHub repositories, papers, packages, pre-trained models, fine-tuned models, websites, services) can flourish on one task. The variety, both the good and the bad, is beautiful.


  1. I love “embarrassingly simple”. The phrase “embarrassingly simple” is often used to describe a trend in Machine Learning that, classical heuristic systems are replaced by end-to-end learning systems. Because the world is more complicated than what you can program, that’s why we use Machine Learning in the first place. See Full Self-Driving is HARD! Analyzing Elon Musk re: Tesla Autopilot on Lex Fridman’s Podcast - YouTube where he talked about Tesla 11. And The Bitter Lesson summarize this phenomenon as that model scale and large amounts of data outperform domain-specific inductive biases. ↩︎