{"id":1441,"date":"2017-09-30T12:36:00","date_gmt":"2017-09-30T12:36:00","guid":{"rendered":"https:\/\/nag.com\/?post_type=insights&#038;p=1203"},"modified":"2023-07-11T16:08:02","modified_gmt":"2023-07-11T16:08:02","slug":"automatic-differentiation-in-more-depth","status":"publish","type":"insights","link":"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/","title":{"rendered":"Automatic Differentiation in More Depth"},"content":{"rendered":"<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>$\\newcommand{\\R}{\\mathbb{R}} \\newcommand{\\costR}{\\mathcal{R}}$<br \/>To delve a little more into AD, consider a computer implementation of a function $f$ with $n$ inputs and one output, i.e. $f:\\R^n\\rightarrow \\R$. AD can be applied to vector-valued functions as well, but to keep things simple we only consider real valued functions below. AD comes in two modes, forward and reverse.<\/p>\n<h3>Forward (tangent-linear) Mode AD<\/h3>\n<p>The forward (or tangent-linear) AD version of $f$ is a function $F^{(1)}:\\R^{2n}\\rightarrow\\R$ given by<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<!-- Spacer -->\n<div class=\"pt-2 pt-lg-2 pt-xl-2\" ><\/div>\n\n\n<math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" display=\"block\">\n  <msup>\n    <mi>y<\/mi>\n    <mrow class=\"MJX-TeXAtom-ORD\">\n      <mo stretchy=\"false\">(<\/mo>\n      <mn>1<\/mn>\n      <mo stretchy=\"false\">)<\/mo>\n    <\/mrow>\n  <\/msup>\n  <mo>=<\/mo>\n  <msup>\n    <mi>F<\/mi>\n    <mrow class=\"MJX-TeXAtom-ORD\">\n      <mo stretchy=\"false\">(<\/mo>\n      <mn>1<\/mn>\n      <mo stretchy=\"false\">)<\/mo>\n    <\/mrow>\n  <\/msup>\n  <mo stretchy=\"false\">(<\/mo>\n  <mrow class=\"MJX-TeXAtom-ORD\">\n    <mi mathvariant=\"bold\">x<\/mi>\n  <\/mrow>\n  <mo>,<\/mo>\n  <mrow class=\"MJX-TeXAtom-ORD\">\n    <msup>\n      <mi mathvariant=\"bold\">x<\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <mo mathvariant=\"bold\" stretchy=\"false\">(<\/mo>\n        <mn mathvariant=\"bold\">1<\/mn>\n        <mo mathvariant=\"bold\" stretchy=\"false\">)<\/mo>\n      <\/mrow>\n    <\/msup>\n  <\/mrow>\n  <mo stretchy=\"false\">)<\/mo>\n  <mo>=<\/mo>\n  <mi mathvariant=\"normal\">&#x2207;<!-- \u2207 --><\/mi>\n  <mi>f<\/mi>\n  <mo stretchy=\"false\">(<\/mo>\n  <mrow class=\"MJX-TeXAtom-ORD\">\n    <mi mathvariant=\"bold\">x<\/mi>\n  <\/mrow>\n  <mo stretchy=\"false\">)<\/mo>\n  <mo>&#x22C5;<!-- \u22c5 --><\/mo>\n  <mrow class=\"MJX-TeXAtom-ORD\">\n    <msup>\n      <mi mathvariant=\"bold\">x<\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <mo mathvariant=\"bold\" stretchy=\"false\">(<\/mo>\n        <mn mathvariant=\"bold\">1<\/mn>\n        <mo mathvariant=\"bold\" stretchy=\"false\">)<\/mo>\n      <\/mrow>\n    <\/msup>\n  <\/mrow>\n  <mo>=<\/mo>\n  <mrow>\n    <mo>(<\/mo>\n    <mfrac>\n      <mrow>\n        <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n        <mi>f<\/mi>\n      <\/mrow>\n      <mrow>\n        <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n        <mrow class=\"MJX-TeXAtom-ORD\">\n          <mi mathvariant=\"bold\">x<\/mi>\n        <\/mrow>\n      <\/mrow>\n    <\/mfrac>\n    <mo>)<\/mo>\n  <\/mrow>\n  <mo>&#x22C5;<!-- \u22c5 --><\/mo>\n  <mrow class=\"MJX-TeXAtom-ORD\">\n    <msup>\n      <mi mathvariant=\"bold\">x<\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <mo mathvariant=\"bold\" stretchy=\"false\">(<\/mo>\n        <mn mathvariant=\"bold\">1<\/mn>\n        <mo mathvariant=\"bold\" stretchy=\"false\">)<\/mo>\n      <\/mrow>\n    <\/msup>\n  <\/mrow>\n<\/math>\n\n\n<!-- Spacer -->\n<div class=\"pt-2 pt-lg-2 pt-xl-2\" ><\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>for inputs $\\mathbf{x},\\mathbf{x^{(1)}}\\in\\R^n$ where the dot is regular dot product. To get the whole gradient of $f$ we let $\\mathbf{x^{(1)}}$ range over Cartesian basis vectors and call $F^{(1)}$ repeatedly.<\/p>\n<ul>\n<li>The runtime of $F^{(1)}$ is typically similar to the runtime of $f$<\/li>\n<li>Computing the whole gradient is roughly $n$ times the cost of computing $f$<\/li>\n<li>Forward mode AD has roughly the same cost as finite differences but computes gradients to machine precision<\/li>\n<\/ul>\n<p>Forward mode AD is typically used when $n$ is small, say less than 30, although the exact figure will depend on the function being differentiated. Above this, adjoint methods are used.<\/p>\n<h3>Adjoint Mode AD (or reverse mode)<\/h3>\n<h4>Intuition<\/h4>\n<p>To understand adjoint mode AD it helps to consider an input $\\mathbf{x}\\in\\mathbb{R}^n$ being moved along by a sequence of function calls to an output $y\\in\\mathbb{R}$<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<!-- Spacer -->\n<div class=\"pt-2 pt-lg-2 pt-xl-2\" ><\/div>\n\n\n<math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" display=\"block\">\n  <mrow class=\"MJX-TeXAtom-ORD\">\n    <mi mathvariant=\"bold\">x<\/mi>\n  <\/mrow>\n  <mover>\n    <mo stretchy=\"false\">&#x27F6;<!-- \u27f6 --><\/mo>\n    <msub>\n      <mi>f<\/mi>\n      <mn>1<\/mn>\n    <\/msub>\n  <\/mover>\n  <mrow class=\"MJX-TeXAtom-ORD\">\n    <msub>\n      <mi mathvariant=\"bold\">x<\/mi>\n      <mn mathvariant=\"bold\">1<\/mn>\n    <\/msub>\n  <\/mrow>\n  <mover>\n    <mo stretchy=\"false\">&#x27F6;<!-- \u27f6 --><\/mo>\n    <msub>\n      <mi>f<\/mi>\n      <mn>2<\/mn>\n    <\/msub>\n  <\/mover>\n  <mrow class=\"MJX-TeXAtom-ORD\">\n    <msub>\n      <mi mathvariant=\"bold\">x<\/mi>\n      <mn mathvariant=\"bold\">2<\/mn>\n    <\/msub>\n  <\/mrow>\n  <mo stretchy=\"false\">&#x27F6;<!-- \u27f6 --><\/mo>\n  <mo>&#x22EF;<!-- \u22ef --><\/mo>\n  <mo stretchy=\"false\">&#x27F6;<!-- \u27f6 --><\/mo>\n  <mrow class=\"MJX-TeXAtom-ORD\">\n    <msub>\n      <mi mathvariant=\"bold\">x<\/mi>\n      <mi mathvariant=\"bold\">m<\/mi>\n    <\/msub>\n  <\/mrow>\n  <mover>\n    <mo stretchy=\"false\">&#x27F6;<!-- \u27f6 --><\/mo>\n    <msub>\n      <mi>f<\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <mi>m<\/mi>\n        <mo>+<\/mo>\n        <mn>1<\/mn>\n      <\/mrow>\n    <\/msub>\n  <\/mover>\n  <mi>y<\/mi>\n<\/math>\n\n\n<!-- Spacer -->\n<div class=\"pt-2 pt-lg-2 pt-xl-2\" ><\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>We want the gradient $\\partial y\/\\partial \\mathbf{x}$ and by the Chain Rule this is just<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<!-- Spacer -->\n<div class=\"pt-2 pt-lg-2 pt-xl-2\" ><\/div>\n\n\n<math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" display=\"block\">\n  <mfrac>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <msub>\n          <mi mathvariant=\"bold\">x<\/mi>\n          <mn mathvariant=\"bold\">1<\/mn>\n        <\/msub>\n      <\/mrow>\n    <\/mrow>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <mi mathvariant=\"bold\">x<\/mi>\n      <\/mrow>\n    <\/mrow>\n  <\/mfrac>\n  <mfrac>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <msub>\n          <mi mathvariant=\"bold\">x<\/mi>\n          <mn mathvariant=\"bold\">2<\/mn>\n        <\/msub>\n      <\/mrow>\n    <\/mrow>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <msub>\n          <mi mathvariant=\"bold\">x<\/mi>\n          <mn mathvariant=\"bold\">1<\/mn>\n        <\/msub>\n      <\/mrow>\n    <\/mrow>\n  <\/mfrac>\n  <mfrac>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <msub>\n          <mi mathvariant=\"bold\">x<\/mi>\n          <mn mathvariant=\"bold\">3<\/mn>\n        <\/msub>\n      <\/mrow>\n    <\/mrow>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <msub>\n          <mi mathvariant=\"bold\">x<\/mi>\n          <mn mathvariant=\"bold\">2<\/mn>\n        <\/msub>\n      <\/mrow>\n    <\/mrow>\n  <\/mfrac>\n  <mo>&#x22EF;<!-- \u22ef --><\/mo>\n  <mfrac>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <msub>\n          <mi mathvariant=\"bold\">x<\/mi>\n          <mrow class=\"MJX-TeXAtom-ORD\">\n            <mi mathvariant=\"bold\">m<\/mi>\n          <\/mrow>\n        <\/msub>\n      <\/mrow>\n    <\/mrow>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <msub>\n          <mi mathvariant=\"bold\">x<\/mi>\n          <mrow class=\"MJX-TeXAtom-ORD\">\n            <mi mathvariant=\"bold\">m<\/mi>\n            <mo mathvariant=\"bold\">&#x2212;<!-- \u2212 --><\/mo>\n            <mn mathvariant=\"bold\">1<\/mn>\n          <\/mrow>\n        <\/msub>\n      <\/mrow>\n    <\/mrow>\n  <\/mfrac>\n  <mfrac>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mi>y<\/mi>\n    <\/mrow>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <msub>\n          <mi mathvariant=\"bold\">x<\/mi>\n          <mi mathvariant=\"bold\">m<\/mi>\n        <\/msub>\n      <\/mrow>\n    <\/mrow>\n  <\/mfrac>\n<\/math>\n\n\n<!-- Spacer -->\n<div class=\"pt-2 pt-lg-2 pt-xl-2\" ><\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <h4><strong>Mathematically<\/strong><\/h4>\n<p>Mathematically it doesn&#8217;t matter which way we evaluate this. The usual way is left to right<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<!-- Spacer -->\n<div class=\"pt-2 pt-lg-2 pt-xl-2\" ><\/div>\n\n\n<math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" display=\"block\">\n  <mrow>\n    <mo fence=\"true\" stretchy=\"true\" symmetric=\"true\"><\/mo>\n    <mrow>\n      <mrow>\n        <mo>(<\/mo>\n        <mrow>\n          <mo>&#x22EF;<!-- \u22ef --><\/mo>\n          <mrow>\n            <mo>(<\/mo>\n            <mrow>\n              <mfrac>\n                <mrow>\n                  <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n                  <mrow class=\"MJX-TeXAtom-ORD\">\n                    <msub>\n                      <mi mathvariant=\"bold\">x<\/mi>\n                      <mn mathvariant=\"bold\">1<\/mn>\n                    <\/msub>\n                  <\/mrow>\n                <\/mrow>\n                <mrow>\n                  <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n                  <mrow class=\"MJX-TeXAtom-ORD\">\n                    <mi mathvariant=\"bold\">x<\/mi>\n                  <\/mrow>\n                <\/mrow>\n              <\/mfrac>\n              <mfrac>\n                <mrow>\n                  <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n                  <mrow class=\"MJX-TeXAtom-ORD\">\n                    <msub>\n                      <mi mathvariant=\"bold\">x<\/mi>\n                      <mn mathvariant=\"bold\">2<\/mn>\n                    <\/msub>\n                  <\/mrow>\n                <\/mrow>\n                <mrow>\n                  <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n                  <mrow class=\"MJX-TeXAtom-ORD\">\n                    <msub>\n                      <mi mathvariant=\"bold\">x<\/mi>\n                      <mn mathvariant=\"bold\">1<\/mn>\n                    <\/msub>\n                  <\/mrow>\n                <\/mrow>\n              <\/mfrac>\n            <\/mrow>\n            <mo>)<\/mo>\n          <\/mrow>\n          <mfrac>\n            <mrow>\n              <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n              <mrow class=\"MJX-TeXAtom-ORD\">\n                <msub>\n                  <mi mathvariant=\"bold\">x<\/mi>\n                  <mn mathvariant=\"bold\">3<\/mn>\n                <\/msub>\n              <\/mrow>\n            <\/mrow>\n            <mrow>\n              <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n              <mrow class=\"MJX-TeXAtom-ORD\">\n                <msub>\n                  <mi mathvariant=\"bold\">x<\/mi>\n                  <mn mathvariant=\"bold\">2<\/mn>\n                <\/msub>\n              <\/mrow>\n            <\/mrow>\n          <\/mfrac>\n        <\/mrow>\n        <mo>)<\/mo>\n      <\/mrow>\n      <mo>&#x22EF;<!-- \u22ef --><\/mo>\n      <mfrac>\n        <mrow>\n          <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n          <mrow class=\"MJX-TeXAtom-ORD\">\n            <msub>\n              <mi mathvariant=\"bold\">x<\/mi>\n              <mrow class=\"MJX-TeXAtom-ORD\">\n                <mi mathvariant=\"bold\">m<\/mi>\n              <\/mrow>\n            <\/msub>\n          <\/mrow>\n        <\/mrow>\n        <mrow>\n          <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n          <mrow class=\"MJX-TeXAtom-ORD\">\n            <msub>\n              <mi mathvariant=\"bold\">x<\/mi>\n              <mrow class=\"MJX-TeXAtom-ORD\">\n                <mi mathvariant=\"bold\">m<\/mi>\n                <mo mathvariant=\"bold\">&#x2212;<!-- \u2212 --><\/mo>\n                <mn mathvariant=\"bold\">1<\/mn>\n              <\/mrow>\n            <\/msub>\n          <\/mrow>\n        <\/mrow>\n      <\/mfrac>\n    <\/mrow>\n    <mo>)<\/mo>\n  <\/mrow>\n  <mfrac>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mi>y<\/mi>\n    <\/mrow>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <msub>\n          <mi mathvariant=\"bold\">x<\/mi>\n          <mi mathvariant=\"bold\">m<\/mi>\n        <\/msub>\n      <\/mrow>\n    <\/mrow>\n  <\/mfrac>\n<\/math>\n\n\n<!-- Spacer -->\n<div class=\"pt-2 pt-lg-2 pt-xl-2\" ><\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>and this is natural since it corresponds to the order of program execution: the program first computes $\\mathbf{x_1}$, then $\\mathbf{x_2}$, and so on. However it involves\u00a0<strong>matrix-matrix multiplications<\/strong>\u00a0followed by final matrix-vector product since in general each Jacobian $\\partial \\mathbf{x_{i+1}}\/\\partial \\mathbf{x_i}$ is a matrix.<\/p>\n<p>Suppose instead we started from the right<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<!-- Spacer -->\n<div class=\"pt-2 pt-lg-2 pt-xl-2\" ><\/div>\n\n\n<math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" display=\"block\">\n  <mfrac>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <msub>\n          <mi mathvariant=\"bold\">x<\/mi>\n          <mn mathvariant=\"bold\">1<\/mn>\n        <\/msub>\n      <\/mrow>\n    <\/mrow>\n    <mrow>\n      <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n      <mrow class=\"MJX-TeXAtom-ORD\">\n        <mi mathvariant=\"bold\">x<\/mi>\n      <\/mrow>\n    <\/mrow>\n  <\/mfrac>\n  <mrow>\n    <mo>(<\/mo>\n    <mrow>\n      <mfrac>\n        <mrow>\n          <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n          <mrow class=\"MJX-TeXAtom-ORD\">\n            <msub>\n              <mi mathvariant=\"bold\">x<\/mi>\n              <mn mathvariant=\"bold\">2<\/mn>\n            <\/msub>\n          <\/mrow>\n        <\/mrow>\n        <mrow>\n          <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n          <mrow class=\"MJX-TeXAtom-ORD\">\n            <msub>\n              <mi mathvariant=\"bold\">x<\/mi>\n              <mn mathvariant=\"bold\">1<\/mn>\n            <\/msub>\n          <\/mrow>\n        <\/mrow>\n      <\/mfrac>\n      <mrow>\n        <mo>(<\/mo>\n        <mrow>\n          <mfrac>\n            <mrow>\n              <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n              <mrow class=\"MJX-TeXAtom-ORD\">\n                <msub>\n                  <mi mathvariant=\"bold\">x<\/mi>\n                  <mn mathvariant=\"bold\">3<\/mn>\n                <\/msub>\n              <\/mrow>\n            <\/mrow>\n            <mrow>\n              <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n              <mrow class=\"MJX-TeXAtom-ORD\">\n                <msub>\n                  <mi mathvariant=\"bold\">x<\/mi>\n                  <mn mathvariant=\"bold\">2<\/mn>\n                <\/msub>\n              <\/mrow>\n            <\/mrow>\n          <\/mfrac>\n          <mo>&#x22EF;<!-- \u22ef --><\/mo>\n          <mrow>\n            <mo>(<\/mo>\n            <mrow>\n              <mfrac>\n                <mrow>\n                  <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n                  <mrow class=\"MJX-TeXAtom-ORD\">\n                    <msub>\n                      <mi mathvariant=\"bold\">x<\/mi>\n                      <mrow class=\"MJX-TeXAtom-ORD\">\n                        <mi mathvariant=\"bold\">m<\/mi>\n                      <\/mrow>\n                    <\/msub>\n                  <\/mrow>\n                <\/mrow>\n                <mrow>\n                  <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n                  <mrow class=\"MJX-TeXAtom-ORD\">\n                    <msub>\n                      <mi mathvariant=\"bold\">x<\/mi>\n                      <mrow class=\"MJX-TeXAtom-ORD\">\n                        <mi mathvariant=\"bold\">m<\/mi>\n                        <mo mathvariant=\"bold\">&#x2212;<!-- \u2212 --><\/mo>\n                        <mn mathvariant=\"bold\">1<\/mn>\n                      <\/mrow>\n                    <\/msub>\n                  <\/mrow>\n                <\/mrow>\n              <\/mfrac>\n              <mfrac>\n                <mrow>\n                  <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n                  <mi>y<\/mi>\n                <\/mrow>\n                <mrow>\n                  <mi mathvariant=\"normal\">&#x2202;<!-- \u2202 --><\/mi>\n                  <mrow class=\"MJX-TeXAtom-ORD\">\n                    <msub>\n                      <mi mathvariant=\"bold\">x<\/mi>\n                      <mi mathvariant=\"bold\">m<\/mi>\n                    <\/msub>\n                  <\/mrow>\n                <\/mrow>\n              <\/mfrac>\n            <\/mrow>\n            <mo>)<\/mo>\n          <\/mrow>\n          <mo>&#x22EF;<!-- \u22ef --><\/mo>\n        <\/mrow>\n        <mo>)<\/mo>\n      <\/mrow>\n    <\/mrow>\n    <mo fence=\"true\" stretchy=\"true\" symmetric=\"true\"><\/mo>\n  <\/mrow>\n<\/math>\n\n\n<!-- Spacer -->\n<div class=\"pt-2 pt-lg-2 pt-xl-2\" ><\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>Now everything is\u00a0<strong>matrix-vector products<\/strong>\u00a0which are much faster, however we\u00a0<em>effectively need to run the program backwards<\/em>:<\/p>\n<ul>\n<li>The data to compute $\\partial y\/\\partial \\mathbf{x_m}$ is only available at the end of the calculation &#8212; it requires $y$ and $\\mathbf{x_m}$, which requires $\\mathbf{x_{m-1}}$, which requires $\\mathbf{x_{m-2}}$ and so on<\/li>\n<li>One way to solve this is to run the program forwards and\u00a0<em>store all relevant intermediate values<\/em><\/li>\n<li>Then we step backwards, constructing the Jacobians $\\frac{\\partial \\mathbf{x_{i+1}}}{\\partial \\mathbf{x_i}}$ from the stored values, and performing the matrix-vector products<\/li>\n<\/ul>\n<p>This whole approach to computing gradients is called the\u00a0<em>adjoint mode<\/em>\u00a0of AD.<\/p>\n<h4>The Adjoint Model<\/h4>\n<p>The adjoint model of $f$ is a function $\\mathbf{x_{(1)}} = F_{(1)}(\\mathbf{x},\\mathbf{x_{(1)}},y_{(1)})$ mapping $\\R^n\\!\\times\\!\\R^n\\!\\times\\!\\R$ to $\\R^n$ given by \\begin{equation*} \\mathbf{x_{(1)}} = \\mathbf{x_{(1)}} + \\nabla f(\\mathbf{x}) \\cdot y_{(1)} \\end{equation*}<\/p>\n<ul>\n<li>Note that $y_{(1)}$ is a scalar. Hence setting $y_{(1)}=1$ and $\\mathbf{x_{(1)}}=0$ and calling the adjoint model $F_{(1)}$\u00a0<strong>once<\/strong>\u00a0gives the full vector of partial derivatives of $f$.<\/li>\n<li>The Jacobians $\\partial \\mathbf{x_{i+1}}\/\\partial \\mathbf{x_i}$ are not formed explicitly and sparsity is exploited.<\/li>\n<li>It can be proved that, in general, computing $F_{(1)}$ requires no more than five times as many floating point operations as computing $f$.<\/li>\n<li>This implies that the adjoint can give the full gradient at a cost which is a (small) multiple $\\costR$ of the cost of running $f$.<\/li>\n<li>However to implement the adjoint model we need to solve a\u00a0<strong>dataflow reversal problem<\/strong>\u00a0which dominates by far the computational cost.<\/li>\n<li>Hence typical values of $\\costR$ are between 5 and 50, depending on the specific code.<\/li>\n<\/ul>\n<h4>Adjoint Model and Memory Requirements<\/h4>\n<p>Performing adjoint calculations requires solving a data flow reversal problem: the program essentially has to be run backwards. Many AD tools (including\u00a0<code>dco<\/code>) approach this by running the program forwards and storing intermediate calculations to memory in a datastructure called a\u00a0<em>tape<\/em>. Even for relatively simple codes the tape can be several GBs, and for production codes will typically exceed the capacity of even large memory machines.<\/p>\n<p>To solve this problem,\u00a0<code>dco<\/code> has a flexible interface which allows users to easily insert checkpoints at various points in their code. When the code is run backwards the final checkpoint is restored and that section of computation taped and played back, then the second-to-last checkpoint is restored and that section of computation is taped and played back (with the previous playback&#8217;s results), and so on. In this way, memory is traded for flops, with the result that the size of the tape can be constrained almost arbitrarily.<\/p>\n<p>This functionality is essential in getting adjoint models of production codes to run at all. For more information on checkpointing as well as other techniques for reducing the memory footprint of adjoint codes, please <a href=\"https:\/\/www.support.nag.com\/content\/nag-technical-support-service#contact\">contact us<\/a>.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n\n<div class=\"gbc-title-banner tac tac-lg tac-xl\" style='border-radius: 0px; '>\n    <div class=\"container\" style='border-radius: 0px; '>\n        <div class=\"row justify-content--center\" >\n            <div class=\"col-12\"  >\n                <div class=\"wrap pv-4 \" style=\"0pxbackground-color: \">\n                                <div class=\"col-12 col-md-10 col-lg-8 col-xl-6  banner-content\"  >\n    \n                    \n                    <div class=\"mt-1 mb-1 content\"><\/div>\n\n                    \n                    <a href='https:\/\/nag.com\/contact-us\/' style='background-color: #ff7d21ff; color: #ffffffff; border-radius: 30px; font-weight: 600; ' class='btn mr-1  ' target=\"_blank\">Find out more about AD <i class='fas fa-angle-right'><\/i><\/a>                <\/div>\n                <\/div>\n            <\/div>\n        <\/div>\n    <\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>We delve a little more into algorithmic or automatic differentiation.<\/p>\n","protected":false},"author":5,"featured_media":988,"parent":0,"menu_order":0,"template":"","meta":{"content-type":"","footnotes":""},"post-tag":[16],"class_list":["post-1441","insights","type-insights","status-publish","has-post-thumbnail","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.8 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Automatic Differentiation in More Depth - nAG<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Automatic Differentiation in More Depth - nAG\" \/>\n<meta property=\"og:description\" content=\"We delve a little more into algorithmic or automatic differentiation.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/\" \/>\n<meta property=\"og:site_name\" content=\"nAG\" \/>\n<meta property=\"article:modified_time\" content=\"2023-07-11T16:08:02+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/Blog_Post-Myth-1-1024x576.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"576\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@NAGTalk\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/\",\"url\":\"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/\",\"name\":\"Automatic Differentiation in More Depth - nAG\",\"isPartOf\":{\"@id\":\"https:\/\/nag.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/Blog_Post-Myth-1.png\",\"datePublished\":\"2017-09-30T12:36:00+00:00\",\"dateModified\":\"2023-07-11T16:08:02+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/#primaryimage\",\"url\":\"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/Blog_Post-Myth-1.png\",\"contentUrl\":\"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/Blog_Post-Myth-1.png\",\"width\":5333,\"height\":3000,\"caption\":\"Automatic Differentiation\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/nag.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Insights\",\"item\":\"https:\/\/nag.com\/insights\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Automatic Differentiation in More Depth\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/nag.com\/#website\",\"url\":\"https:\/\/nag.com\/\",\"name\":\"NAG\",\"description\":\"Robust, trusted numerical software and computational expertise.\",\"publisher\":{\"@id\":\"https:\/\/nag.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/nag.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/nag.com\/#organization\",\"name\":\"Numerical Algorithms Group\",\"alternateName\":\"NAG\",\"url\":\"https:\/\/nag.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/nag.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/nag.com\/wp-content\/uploads\/2023\/11\/NAG-Logo.png\",\"contentUrl\":\"https:\/\/nag.com\/wp-content\/uploads\/2023\/11\/NAG-Logo.png\",\"width\":1244,\"height\":397,\"caption\":\"Numerical Algorithms Group\"},\"image\":{\"@id\":\"https:\/\/nag.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/NAGTalk\",\"https:\/\/www.linkedin.com\/company\/nag\/\",\"https:\/\/www.youtube.com\/user\/NumericalAlgorithms\",\"https:\/\/github.com\/numericalalgorithmsgroup\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Automatic Differentiation in More Depth - nAG","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/","og_locale":"en_US","og_type":"article","og_title":"Automatic Differentiation in More Depth - nAG","og_description":"We delve a little more into algorithmic or automatic differentiation.","og_url":"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/","og_site_name":"nAG","article_modified_time":"2023-07-11T16:08:02+00:00","og_image":[{"width":1024,"height":576,"url":"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/Blog_Post-Myth-1-1024x576.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@NAGTalk","twitter_misc":{"Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/","url":"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/","name":"Automatic Differentiation in More Depth - nAG","isPartOf":{"@id":"https:\/\/nag.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/#primaryimage"},"image":{"@id":"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/#primaryimage"},"thumbnailUrl":"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/Blog_Post-Myth-1.png","datePublished":"2017-09-30T12:36:00+00:00","dateModified":"2023-07-11T16:08:02+00:00","breadcrumb":{"@id":"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/#primaryimage","url":"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/Blog_Post-Myth-1.png","contentUrl":"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/Blog_Post-Myth-1.png","width":5333,"height":3000,"caption":"Automatic Differentiation"},{"@type":"BreadcrumbList","@id":"https:\/\/nag.com\/insights\/automatic-differentiation-in-more-depth\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/nag.com\/"},{"@type":"ListItem","position":2,"name":"Insights","item":"https:\/\/nag.com\/insights\/"},{"@type":"ListItem","position":3,"name":"Automatic Differentiation in More Depth"}]},{"@type":"WebSite","@id":"https:\/\/nag.com\/#website","url":"https:\/\/nag.com\/","name":"NAG","description":"Robust, trusted numerical software and computational expertise.","publisher":{"@id":"https:\/\/nag.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/nag.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/nag.com\/#organization","name":"Numerical Algorithms Group","alternateName":"NAG","url":"https:\/\/nag.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/nag.com\/#\/schema\/logo\/image\/","url":"https:\/\/nag.com\/wp-content\/uploads\/2023\/11\/NAG-Logo.png","contentUrl":"https:\/\/nag.com\/wp-content\/uploads\/2023\/11\/NAG-Logo.png","width":1244,"height":397,"caption":"Numerical Algorithms Group"},"image":{"@id":"https:\/\/nag.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/NAGTalk","https:\/\/www.linkedin.com\/company\/nag\/","https:\/\/www.youtube.com\/user\/NumericalAlgorithms","https:\/\/github.com\/numericalalgorithmsgroup"]}]}},"_links":{"self":[{"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/insights\/1441","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/insights"}],"about":[{"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/types\/insights"}],"author":[{"embeddable":true,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/users\/5"}],"version-history":[{"count":21,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/insights\/1441\/revisions"}],"predecessor-version":[{"id":3235,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/insights\/1441\/revisions\/3235"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/media\/988"}],"wp:attachment":[{"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/media?parent=1441"}],"wp:term":[{"taxonomy":"post-tag","embeddable":true,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/post-tag?post=1441"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}