I started a page called AI safety papers to track my progress on reading AI safety papers. How should I split my time between trying to understand papers and “catching up” on background mathematics? I am not really sure what to think, and so far I’ve just been going with what my curiosity/gut says. The two activities aren’t really separate because the latter helps with the former, but it’s more of a psychological thing: should I think of myself as someone trying to run up against a wall (the cutting edge AI safety papers) and then back-chaining toward background material when I get stuck? Or should I think of myself as building up some sort of knowledge base/personal encyclopedia, and slowly expanding the base to cover cutting edge stuff?

I wrote the page Distribution of X over Y. This mostly comes from my frustration with people using the word “distribution” in many sort-of-similar but formally-different ways. I wanted to see if there is some unified way of looking at this word.

I read a bit more of Myerson’s book. (My plan is to leave this alone for now while I process the definitions in Anki over the coming days.)

I looked at Tao’s proof of the least upper bound property for reals, especially at the part where I got stuck. The proof is actually pretty interesting to me now that I have forgotten some of the material (when I was originally going through this chapter, it was sort of a mindless sequence of arbitrary-seeming steps), especially all the previous propositions that are used in the proof of this theorem. When I was originally going through the book, the propositions come in kind of a meaningless order (they are just the order in which things can be proved, from a logical standpoint). But when I set a “target” of an interesting theorem to prove (here, the least upper bound property), I have to go backwards to hunt down all the prerequisites. If I just went through the prerequisites (as when I originally worked through the book), it can seem obvious what the prerequisites are (because things are still fresh in my mind), so I am not doing the work of hunting them down. If I forget just enough of the proof, I strike a good balance between (1) not being so lost that it’s impossible to prove it in a reasonable amount of time, and (2) not being primed so much that the results to use in the proof are obvious and I am not doing the work of thinking of the structure of the proof.

I started learning about Lagrange multipliers. Actually, I have seen this material several times, but I keep forgetting it, so this time I decided to go through it extra-carefully, Ankifying as I go, to make sure the understanding sticks. I am interested in this because it seems to come up everywhere.

I thought about the question of why a linear transformation (in \mathbf R^2, with rank 2) turns the unit circle into an ellipse. This actually led to the question of what an ellipse even is (I find the sum of distances definition unintuitive). What seems natural to me is to say something like an ellipse is anything that “looks like” (a \cos t, b\sin t) if you use the right orthonormal basis. I think I will write more about this at some point (on a Subwiki). I am interested in this to understand how linear transformations act, and also because this geometric view seems important for understanding SVD.