Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions (Paper Explained)

#imle #backpropagation #discrete Backpropagation is the workhorse of deep learning, but unfortunately, it only works for continuous functions that are amenable to the chain rule of differentiation. Since discrete algorithms have no continuous derivative, deep networks with such algorithms as part of them cannot be effectively trained using backpropagation. This paper presents a method to incorporate a large class of algorithms, formulated as discrete exponential family distributions, into deep networks and derives gradient estimates that can easily be used in end-to-end backpropagation. This enables things like combinatorial optimizers to be part of a network’s forward propagation natively. OUTLINE: 0:00 - Intro & Overview 4:25 - Sponsor: Weights & Biases 6:15 - Problem Setup & Contributions 8:50 - Recap: Straight-Through Estimator 13:25 - Encoding the discrete problem as an inner product 19:45 - From algorithm to distribution 23:15 - Substituting the gradient 26:50 - Defining a target distribution 38:30 -

24 views