2019-01-15
I continued thinking about some things in probability and statistics (extending the sample space, what the type of a “hypothesis” is, the setup of classical statistical inference).
I started on pages for Expectation and Variance.
I continued thinking about some things in probability and statistics (extending the sample space, what the type of a “hypothesis” is, the setup of classical statistical inference).
I started on pages for Expectation and Variance.
I thought about some things in probability and statistics, e.g. notation for expected value and the expansion of the sample space.
I started the pages Subfield of math as study of concepts preserved under transformation, Disappearance of sample space, and List of sample space perspectives.
I finished reading “A Technical Explanation of Technical Explanation”.
I read “Toward a New Technical Explanation of Technical Explanation”.
I read most of the way through “A Technical Explanation of Technical Explanation”.
I continued learning about Lagrange multipliers. I want to say more about this later (probably on a Subwiki), but for now I will just note that Lagrange multipliers seem like an inherently visual topic, so I don’t like how many explanations are very verbal. Even when an explanation includes a visualization, it usually includes just one visualization; if I were to explain it, I would want to include multiple visualizations, including nearby incorrect visualizations.
I started a page called AI safety papers to track my progress on reading AI safety papers. How should I split my time between trying to understand papers and “catching up” on background mathematics? I am not really sure what to think, and so far I’ve just been going with what my curiosity/gut says. The two activities aren’t really separate because the latter helps with the former, but it’s more of a psychological thing: should I think of myself as someone trying to run up against a wall (the cutting edge AI safety papers) and then back-chaining toward background material when I get stuck? Or should I think of myself as building up some sort of knowledge base/personal encyclopedia, and slowly expanding the base to cover cutting edge stuff?
I wrote the page Distribution of X over Y. This mostly comes from my frustration with people using the word “distribution” in many sort-of-similar but formally-different ways. I wanted to see if there is some unified way of looking at this word.
I read a bit more of Myerson’s book. (My plan is to leave this alone for now while I process the definitions in Anki over the coming days.)
I looked at Tao’s proof of the least upper bound property for reals, especially at the part where I got stuck. The proof is actually pretty interesting to me now that I have forgotten some of the material (when I was originally going through this chapter, it was sort of a mindless sequence of arbitrary-seeming steps), especially all the previous propositions that are used in the proof of this theorem. When I was originally going through the book, the propositions come in kind of a meaningless order (they are just the order in which things can be proved, from a logical standpoint). But when I set a “target” of an interesting theorem to prove (here, the least upper bound property), I have to go backwards to hunt down all the prerequisites. If I just went through the prerequisites (as when I originally worked through the book), it can seem obvious what the prerequisites are (because things are still fresh in my mind), so I am not doing the work of hunting them down. If I forget just enough of the proof, I strike a good balance between (1) not being so lost that it’s impossible to prove it in a reasonable amount of time, and (2) not being primed so much that the results to use in the proof are obvious and I am not doing the work of thinking of the structure of the proof.
I started learning about Lagrange multipliers. Actually, I have seen this material several times, but I keep forgetting it, so this time I decided to go through it extra-carefully, Ankifying as I go, to make sure the understanding sticks. I am interested in this because it seems to come up everywhere.
I thought about the question of why a linear transformation (in , with rank 2) turns the unit circle into an ellipse. This actually led to the question of what an ellipse even is (I find the sum of distances definition unintuitive). What seems natural to me is to say something like an ellipse is anything that “looks like” if you use the right orthonormal basis. I think I will write more about this at some point (on a Subwiki). I am interested in this to understand how linear transformations act, and also because this geometric view seems important for understanding SVD.
I started the page K is recursively enumerable.
I continued reading the reflective oracles paper. I got to the statement of theorem 4.1 and realized I should probably be more comfortable with game theory first. So I started going through Myerson’s book. Meta comment: this was the plan from the beginning (to try to read an AI safety paper and see where I get stuck from not having background), although I didn’t know where I would get stuck. I plan to read more papers in this way so that I can build up a better model of the topic dependencies.
I wrote Change of basis example in two dimensions.
I also tried proving the least upper bound property of the real numbers (the proof in Tao’s book). I think I was able to define the sequence but I got stuck trying to show that it is Cauchy.
I didn’t study math on this day, except for Anki reviews.
I read a bit of (and a bit about) Lakatos’s Proofs and Refutations.
I thought about some stuff in linear algebra (matrix decomposition, change of basis). I started on Matrix of a linear transformation, Riesz representation theorem, and Invertible equals expressible as change of coordinate matrix. I also added to the page List of matrix products.
I watched the 3Blue1Brown playlist on neural networks. (As of this writing, there are four videos in this playlist.) I was familiar with the material since I went through Michael Nielsen’s book on the subject about a year ago. I watched this mostly for fun/review/to see if there was anything I didn’t already know. There were a couple parts that I don’t think Nielsen’s book explicitly mentioned (e.g. how feeding random data as input to the network would cause the network to confidently predict a digit). I also liked that in in the fourth video (“Backpropagation calculus”), the weights appear on the computational graph. This is a point that I’ve found confusing myself and have tried to emphasize in my own draft explanation. Another thing I liked is that one of the videos (I forget which one) explicitly said that in the cost function, the weights and biases are the inputs and the training set is the parameters.
I finished reading “AI safety via debate”. I think I understood the main points, but there were some parts I didn’t understand (e.g. I don’t know much about computational complexity so I couldn’t appreciate the analogy with the polynomial hierarchy, although I am familiar with similar hierarchies from logic).
I read “Supervising strong learners by amplifying weak experts”. As with above, I think I understood the basic idea, but since I’m not familiar with a lot of ML stuff I couldn’t appreciate the relation to existing work, the model architecture, etc.
I thought about the change of coordinate matrix again (linear algebra).
I started reading the reflective oracles paper.
I did the quiz “Linear systems: rank and dimension considerations”. Since Vipul might ask, I got 8/8 on this quiz (no discussion and no notes, and it has been several weeks since I read the corresponding part in Vipul’s lecture notes). I found this quiz pretty easy, although I found the long verbal descriptions in the quiz questions a bit intimidating (perhaps because I was tired).
I did more linear algebra. I thought a bit about how injectivity/surjectivity change the rank. I also thought about the question of whether every invertible transformation “looks like” the identity matrix given a clever choice of bases. I also thought about how to find a basis for the row space, and the proof in Treil’s book of this.
I started reading Li and Vitanyi’s Kolmogorov complexity book again, but then decided I didn’t feel ready/I wanted to do something else.
I started reading “AI safety via debate”.
I continued learning about belief propagation from Pearl’s book.
I continued with linear algebra. I started writing List of matrix products and Classification of operators.
I then started reading about belief propagation from Pearl’s book. This turned out to be fairly straightforward since I had gone through this material several months ago.
I did some first-day-of-month bureaucracies so I feel like I didn’t do much math on this day.
I wrote Type checking vector spaces, Properties of a list of vectors and their images, and Mental representations in mathematics.
I did some exercises in Cutland’s computability book.
I thought a bit about the relationship between dot product and the angle between two vectors.
I finished Tao’s linear algebra notes (I just had one section left in week 10).
I did the problems in problem set 0 of the Stanford machine learning course. I remember around a year ago when I looked at this problem set, I was quite intimidated because I had forgotten all the linear algebra I had learned at UW. However, the problems turned out to be quite easy (probably because of the time I have recently spent on linear algebra, but also maybe because I didn’t realize how easy the problems were before).
I looked through the “Linear Algebra Review and Reference” notes for the same course. Most of it I knew thanks to studying linear algebra, but there were some things that other resources on linear algebra don’t tend to talk about, like positive semidefinite matrices.
I continued with linear algebra, reading part of week 10 of Tao’s notes.
I then started writing a multiple-choice quiz on computability theory. For now, you can see an interactive version of the quiz here. (That link may become obsolete when I publish the quiz, but the GitHub link should be permanent.)
I continued with linear algebra using Tao’s notes, weeks 7, 8, and 9.
I thought about the following problem a bit, but didn’t get far: if we have an inner product space, we can project a vector onto a subspace to get a best approximation of the vector inside the subspace. But if we start out with a notion of best approximation, can we go from that to an inner product? For instance, the tangent line of a curve through a point is (in a specific sense) the best linear approximation of the curve near that point. Can we now define an inner product (over, say, the polynomials of degree at most ) and project an arbitrary polynomial onto the subspace of polynomials of degree at most so as to recover the tangent line?
I continued with linear algebra. I went through some of Terence Tao’s linear algebra notes. Especially weeks 3, 4, 5, 6. I have gone through similar material in Axler’s book and other places, which is why I was able to go through the material relatively quickly. I mostly read the notes, but worked out some of the proofs before reading/did some of the exercises.
It’s interesting for me to see how different authors cover the same material (I also enjoyed doing this with real analysis). For instance, Tao gives up on doing determinants completely rigorously, saying we would need more advanced machinery to understand it properly. Tao also does fun one-dimensional unit conversion (length, currency) and chemistry (converting molecules to atoms, etc.) examples. I also hadn’t seen the shear operation explained in terms of a parallelogram’s area before (i.e. the shear operation changes neither the base nor the height of a parallelogram, so does not change the determinant).
I still feel like linear algebra is a jumble of facts. At the same time, I feel like there is a way to organize everything neatly (I think this table is a start), and that’s one of the things motivating me right now.
I worked through this worksheet about a basis for kernel and image. I did some other linear algebra stuff as well. I read the “Coordinates” note from Vipul Naik’s linear algebra notes.
I returned to Kleene’s first and second recursion theorems. I was able to prove both (using the s-m-n theorem) without looking at notes, but I still feel like I don’t really understand these. I went to Sipser’s book to see how he did things there.
Reply