{"id":1440,"date":"2021-09-21T12:26:00","date_gmt":"2021-09-21T12:26:00","guid":{"rendered":"https:\/\/nag.com\/?post_type=insights&#038;p=1186"},"modified":"2023-08-03T16:45:39","modified_gmt":"2023-08-03T16:45:39","slug":"loss-function-and-robustness-in-data-fitting","status":"publish","type":"insights","link":"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/","title":{"rendered":"Loss Function and Robustness in Data-Fitting"},"content":{"rendered":"<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <h4>Co-authored by George Tsikas<\/h4>\n<h3>Technical Setup<\/h3>\n<h3><span class=\"nag-n-override\" style=\"margin-left: 0 !important;\"><i>n<\/i><\/span>AG Library install<\/h3>\n<p>To run the code in this blog, you will need to install the <span class=\"nag-n-override\" style=\"margin-left: 0 !important;\"><i>n<\/i><\/span>AG Library for Python (Mark 28.5 or newer) and a license key. You can find the software and request a license key from our website here:\u00a0<a href=\"https:\/\/www.support.nag.com\/content\/getting-started-nag-library?lang=py&amp;os=linux\" target=\"_blank\" rel=\"noopener\">Getting Started with <span class=\"nag-n-override\" style=\"margin-left: 0 !important;\"><i>n<\/i><\/span>AG Library<\/a><\/p>\n<h3>Introduction<\/h3>\n<p>Fitting a non-linear model to data is typically modelled as a minimisation problem, where the objective function serves as a measurement of the quality of the model\u2019s fit to data, depending on our parameters. A general model involves summing over our data points,<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n\n<math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" display=\"block\">\n  <munder>\n    <mtext>minimize<\/mtext>\n    <mrow>\n      <mi>x<\/mi>\n      <mo>&#x2208;<!-- \u2208 --><\/mo>\n      <msup>\n        <mrow class=\"MJX-TeXAtom-ORD\">\n          <mi mathvariant=\"double-struck\">R<\/mi>\n        <\/mrow>\n        <mrow class=\"MJX-TeXAtom-ORD\">\n          <msub>\n            <mi>n<\/mi>\n            <mrow class=\"MJX-TeXAtom-ORD\">\n              <mtext>var<\/mtext>\n            <\/mrow>\n          <\/msub>\n        <\/mrow>\n      <\/msup>\n    <\/mrow>\n  <\/munder>\n  <mtext>&#xA0;<\/mtext>\n  <mi>f<\/mi>\n  <mo stretchy=\"false\">(<\/mo>\n  <mi>x<\/mi>\n  <mo stretchy=\"false\">)<\/mo>\n  <mo>=<\/mo>\n  <munderover>\n    <mo>&#x2211;<!-- \u2211 --><\/mo>\n    <mrow class=\"MJX-TeXAtom-ORD\">\n      <mi>i<\/mi>\n      <mo>=<\/mo>\n      <mn>1<\/mn>\n    <\/mrow>\n    <mrow class=\"MJX-TeXAtom-ORD\">\n      <msub>\n        <mi>n<\/mi>\n        <mrow class=\"MJX-TeXAtom-ORD\">\n          <mtext>res<\/mtext>\n        <\/mrow>\n      <\/msub>\n    <\/mrow>\n  <\/munderover>\n  <mi>&#x03C7;<!-- \u03c7 --><\/mi>\n  <mo stretchy=\"false\">(<\/mo>\n  <msub>\n    <mi>r<\/mi>\n    <mi>i<\/mi>\n  <\/msub>\n  <mo stretchy=\"false\">(<\/mo>\n  <mi>x<\/mi>\n  <mo stretchy=\"false\">)<\/mo>\n  <mo stretchy=\"false\">)<\/mo>\n  <mo>,<\/mo>\n<\/math>\n\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>where $x$ is a vector holding our model parameters, of which there are $n_\\text{var}$. We have $n_\\text{res}$ data points, and $r_i(x)= y_i &#8211; \\varphi(t_i;x), \\quad i = 1,&#8230;,n_\\text{res}$ is the $i^{th}$ residual, equal to the difference between the observed and predicted values of the independent variable at time $t_i$, denoted $y_i$ and $\\varphi(t_i;x)$ respectively. The loss function $\\chi$ has desirable properties such as being bounded from below, and increasing with $|r_i\\left(x\\right)|$. Summing over all data points then, the objective function will be small when the model fits the whole dataset well, which is what we want.<\/p>\n<p>There are plenty of choices for function $\\chi$, so how does our choice of loss function affect the fit we end up with? One important consideration is robustness. If some of the observed data points are far from the fitted model, how can we control the influence of those outliers? A robust loss function is one which doesn\u2019t get thrown off easily by outliers in the data.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre><code># import all necessary packages\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom naginterfaces.base import utils\nfrom naginterfaces.library import opt<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <h2>Single-outlier example<\/h2>\n<p>To investigate the robustness aspect, here\u2019s a toy dataset which is generated from $\\sin(t)$ and has an outlier at $t=1.5$, which is generated by $5\\sin(t)$.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre><code># create data set\nt = np.linspace(0.5, 2.5, num=21)\ny = np.sin(t)\ny[10] = 5*np.sin(t[10])<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre><code>fig1 = plt.plot(t,y,'*b')<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1193 size-full\" src=\"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/image_1.png\" alt=\"loss function table\" width=\"374\" height=\"257\" \/><\/p>\n<p>We will fit it with a model<\/p>\n<p>$$ \\varphi(t;x)\\ =x_1\\sin(x_2t) $$<\/p>\n<p>using a variety of loss functions provided by <span class=\"nag-n-override\" style=\"margin-left: 0 !important;\"><i>n<\/i><\/span>AG\u2019s data-fitting solver <strong>handle_solve_nldf<\/strong>\u00a0(<code>e04gn<\/code>), which constructs the appropriate objective function for us.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp\"><code># create a handle for the model nvar = 2 \nhandle = opt.handle_init(nvar=nvar)<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre><code># register residuals structure\nnres = 21\nopt.handle_set_nlnls(handle, nres)<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre><code># define the residual callback function and its gradient\ndef lsqfun(x, nres, inform, data):\n    rx = np.zeros(nres,dtype=float)\n    t = data[\"t\"]\n    y = data[\"y\"]\n    for i in range(nres):\n        rx[i] = (y[i] - x[0]*np.sin(x[1]*t[i]))\n        \n    return rx, inform\n\ndef lsqgrd(x, nres, rdx, inform, data):\n    t = data[\"t\"]\n    nvar = len(x)\n    for i in range(nres):\n        rdx[i*nvar] = (-np.sin(x[1]*t[i]))\n        rdx[i*nvar + 1] = (-t[i]*x[0]*np.cos(x[1]*t[i]))\n\n    return inform\n<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint\"># create the data structure to be passed to the solver\ndata = {}\ndata[\"t\"] = t\ndata[\"y\"] = y<\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <h3>Start with $l_2$-norm loss function &#8211; Example 1<\/h3>\n<p>Starting with one of the most common loss functions, the $l_2$-norm, we form the problem<\/p>\n<p>$$ \\underset{x \\in \\mathbb{R}^{2}}{\\text{minimize}}~f(x) =\\sum_{i=1}^{21} r_i(x)^2 $$<\/p>\n<p>which is just least squares regression. $l_2$-norm loss has low robustness against outliers, so we should expect that the solution will be affected heavily by this one outlier. Let\u2019s solve from a starting point at<\/p>\n<p>$$ x\\ =\\ (2.1,1.4) $$<\/p>\n<p>to see what this outlier does to the minimum.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-12 col-lg-12 col-xl-12\">\n            <pre><code># set loss function to l2-norm and printing options\nfor option in [\n    'NLDF Loss Function Type = L2',\n    'Print Level = 1',\n    'Print Options = No',\n    'Print solution = Yes'\n]:\n    opt.handle_opt_set (handle, option)\n\n# use an explicit I\/O manager for abbreviated iteration output:\niom = utils.FileObjManager(locus_in_output=False)<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-12 col-lg-12 col-xl-12\">\n            <pre><code># set initial guess and solve\nx = [2.1, 1.4]\nsoln1 = opt.handle_solve_nldf(\n    handle, lsqfun, lsqgrd, x, nres, data=data, io_manager=iom)<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>E04GN, Nonlinear Data-Fitting<br \/>Status: converged, an optimal solution found<br \/>Final objective value 1.470963E+01<\/p>\n<p>Primal variables:<br \/>\u00a0idx\u00a0\u00a0\u00a0Lower bound\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Value\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Upper bound<br \/>\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -inf\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 1.30111E+00\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 inf<br \/>\u00a0\u00a0 2 \u00a0 \u00a0 \u00a0\u00a0 -inf\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1.06956E+00\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 inf<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint\"><code># calculate fitted data using the optimal parameters\ny_l2_fitted = soln1.x[0]*np.sin(soln1.x[1]*t)<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre><code># plot the fitted curve\nplt.title(\"Fitted with L2 Loss Function\")\nplt.plot(t,y,'*b')\nplt.plot(t,y_l2_fitted)\nplt.show()<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1194 size-full\" src=\"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/image_2.png\" alt=\"L2 Loss Function\" width=\"362\" height=\"264\" \/><\/p>\n<p>The single outlier was able to disrupt the fit, since $l_2$-norm loss makes outliers contribute heavily to the objective function and search direction.<\/p>\n<h3>Try $l_1$-norm loss function &#8211; Example 2<\/h3>\n<p>Using $l_1$-norm loss gives us the problem<\/p>\n<p>$$ \\underset{x \\in \\mathbb{R}^{2}}{\\text{minimize}}~f(x) =\\sum_{i=1}^{21} |r_i(x)|, $$<\/p>\n<p>which is more robust against outliers. This means if some large portion of the data is well-fitted by some solution $x^\\ast$, there is likely to be a local minimum very close to $x^\\ast$ which is relatively undisturbed by the remaining data that is outlying to the solution $x^\\ast$. Here\u2019s the solution, again starting at $x=(2.1,1.4)$, using $l_1$ loss.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre><code># change loss function to l1-norm and solve\nopt.handle_opt_set(handle, 'NLDF Loss Function Type = L1')\nsoln2 = opt.handle_solve_nldf(\n    handle, lsqfun, lsqgrd, x, nres, data=data, io_manager=iom)<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>E04GN, Nonlinear Data-Fitting<br \/>Status: converged, an optimal solution found<br \/>Final objective value 3.989980E+00<\/p>\n<p>Primal variables:<br \/>\u00a0\u00a0 idx\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Lower bound \u00a0 \u00a0 \u00a0 Value\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Upper bound<br \/>\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -inf\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1.00000E+00\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 inf<br \/>\u00a0\u00a0 2\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 -inf \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1.00000E+00\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 inf<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre><code># calculate fitted data using the optimal parameters\ny_l1_fitted = soln2.x[0]*np.sin(soln2.x[1]*t)\n<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre><code># plot the fitted curve\nplt.title(\"Fitted with L1 Loss Function\")\nplt.plot(t,y,'*b')\nplt.plot(t,y_l1_fitted)\nplt.show()<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1195 size-full\" src=\"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/image_3.png\" alt=\"Loss Function L1\" width=\"362\" height=\"264\" \/><\/p>\n<p>Clearly, this is a much better fit for most of the data, and the outlier hasn\u2019t dragged the model off most of the data.<\/p>\n<h3>The trade-off of a loss function<\/h3>\n<p>We can reuse the handle, the residual function (and gradient). Just changing the data and options, we can demonstrate more principles to consider regarding loss functions.<\/p>\n<p>There is a danger in choosing a very robust loss function. During an iterative optimization process, a loss function which is robust against outliers will usually prefer the data which is close to the current model. This means that if the algorithm finds local minima of the objective function, the search can fall into a local minimum when the model fits some subset of the data very well but fits the majority of the data very badly.<\/p>\n<p>To illustrate this, here\u2019s a new dataset which we will try to fit with the same model, again starting at $x= (2.1,1.4)$. Most of the data was generated by $5\\sin(t)$, with the 3 data points at either end being generated by $\\sin(t)$.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre><code># create the data set\ny_new = y\nfor i in range(3,18):\n    y_new[i] = 5*np.sin(t[i])<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre>plt.plot(t,y_new,'*b')\nplt.show()<\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1196 size-full\" src=\"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/image_4.png\" alt=\"Loss Function 3\" width=\"362\" height=\"248\" \/><\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-12 col-lg-12 col-xl-12\">\n            <pre><code># recreate the data structure to be passed to the solver\ndata_new = {}\ndata_new[\"t\"] = t\ndata_new[\"y\"] = y_new<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>We will fit this data set using 3 different loss functions with the same model $\\varphi(t;x)$ each time and discuss the results under the plots all at once below.<\/p>\n<h3>Fit model with the $l_2$-norm, $l_1$-norm and Arctan loss function<\/h3>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre><code>loss_functions = ['L2', 'L1', 'ATAN']\n\n# turn off printing of solver log\nopt.handle_opt_set(handle, 'print file = -1')\n\n# solve using 3 different loss functions\nfor lfunc in loss_functions:\n    \n    # set option for loss function and solve\n    opt.handle_opt_set(handle, 'NLDF Loss Function Type =' + lfunc)\n    soln = opt.handle_solve_nldf(\n        handle, lsqfun, lsqgrd, x, nres, data=data_new, io_manager=iom)\n    # plot fitted curve\n    plt.plot(t, soln.x[0]*np.sin(soln.x[1]*t), label=lfunc)\n\n# plot data points   \nplt.plot(t,y_new,'*b')\nplt.title(\"Fitted with Various Loss Functions\")\nplt.legend()\nplt.show()<\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1197 size-full\" src=\"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/image_5.png\" alt=\"Various Loss Function\" width=\"362\" height=\"264\" \/><\/p>\n<h3>Fitted Models and Contour Plots<\/h3>\n<p>In the first row of plots, the data is fitted using $l_2$-norm loss, $l_1$-norm loss, and $\\arctan$ loss. Shown below each is the contour plot of the objective function value, where the black circles represent the parameters used to generate the data, the cyan circles represents the starting point for the solver, and the cyan wedges represent the optimized solution found by the solver.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1198 size-full\" src=\"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/nldf_contour-1.png\" alt=\"Loss Function Contours\" width=\"2160\" height=\"1440\" \/><\/p>\n<p>In the $l_2$-norm case in the left column, the outliers generated by $\\sin(t)$ have pulled the optimal solution away from $x = (5,1)$. The contour plot for $l_2$-norm loss indicates that we don\u2019t have to worry too much about what starting point to use, since there are no local minima in the region displayed, other than global best solution.<\/p>\n<p>The behaviour of the solver is quite different when using an extremely robust loss function like $\\arctan$ loss, which looks like<\/p>\n<p>$$ \\underset{x \\in \\mathbb{R}^{2}}{\\text{minimize}} ~ f(x) =\\sum_{i=1}^{21} \\text{arctan}(r_i(x)^2) $$<\/p>\n<p>The fitted model and corresponding contour plot for the $\\arctan$ case are in the middle. Here, there are eight local minima in the contour plot for $\\arctan$ loss, with seven of them being substantially worse solutions than the global minimum, and it is one of these we\u2019ve converged to. Therefore, in this case the selection of initial estimation of the parameters is much more important.<\/p>\n<p>The model fitted with $l_1$-norm loss and the corresponding contour plot are in the right column. Looking at the contour plot, there are still a few local minima that do not correspond to the optimal solution, but the starting point of $x = (2.1,1.4)$ still converges to the global minimum, which lies at $x = (5,1)$, meaning the part of the dataset generated from $\\sin(t)$ is effectively being ignoring. From the plots of the loss functions, we can see that $l_1$-norm loss is more robust than $l_2$-norm loss but less so than $\\arctan$ loss.<\/p>\n<p>So, what has happened in each case is: using $l_2$-norm loss, we move to the global minimum which is affected by the whole dataset. Using $l_1$-norm loss, we move to the global minimum which fits most of the data very well and ignores a small portion, treating them as outliers. Using $\\arctan$ loss we move to a local minimum which ignores a large portion of the data (treating them as outliers) and fits a small amount of data very well.<\/p>\n<h3>Conclusion<\/h3>\n<p>The lesson here is that the same thing that makes a loss function robust \u2013 ignoring data that lies far from the current model to some degree \u2013 can populate the search space with local minima where the model predicts some of the data well and ignores most of it. In extreme cases like arctan loss, if the starting point fits some of the data very well, the model will likely just be optimized for that portion of the data, even if it is a small portion of the whole dataset. It is therefore important to try a variety of loss functions and stating points when setting up a data-fitting problem, since these will affect both the optimal solution, as well as how easily an optimal solution is found.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n\n<div class=\"gbc-title-banner tac tac-lg tac-xl\" style='border-radius: 0px; '>\n    <div class=\"container\" style='border-radius: 0px; '>\n        <div class=\"row justify-content--center\" >\n            <div class=\"col-12\"  >\n                <div class=\"wrap pv-4 \" style=\"0pxbackground-color: \">\n                                <div class=\"col-12 col-md-12 col-lg-12 col-xl-12  banner-content\"  >\n    \n                    \n                    <div class=\"mt-1 mb-1 content\"><\/div>\n\n                    \n                    <a href='' style='background-color: #ff7d21ff; color: #ffffffff; border-radius: 30px; font-weight: 600; ' class='btn mr-1  ' >Learn more about the <span class=\"nag-n-override\" style=\"margin-left: 0 !important;\"><i>n<\/i><\/span>AG Library <i class='fas fa-angle-right'><\/i><\/a>                <\/div>\n                <\/div>\n            <\/div>\n        <\/div>\n    <\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Fitting a non-linear model to data is typically modelled as a minimisation problem, where the objective function serves as a measurement of the quality of the model\u2019s fit to data, depending on our parameters. A general model involves summing over our data points.<\/p>\n","protected":false},"author":11,"featured_media":1188,"parent":0,"menu_order":0,"template":"","meta":{"content-type":"","footnotes":""},"post-tag":[22,27,18],"class_list":["post-1440","insights","type-insights","status-publish","has-post-thumbnail","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.8 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Loss Function and Robustness in Data-Fitting - nAG<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Loss Function and Robustness in Data-Fitting - nAG\" \/>\n<meta property=\"og:description\" content=\"Fitting a non-linear model to data is typically modelled as a minimisation problem, where the objective function serves as a measurement of the quality of the model\u2019s fit to data, depending on our parameters. A general model involves summing over our data points.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/\" \/>\n<meta property=\"og:site_name\" content=\"nAG\" \/>\n<meta property=\"article:modified_time\" content=\"2023-08-03T16:45:39+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/finance-four_0.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"2000\" \/>\n\t<meta property=\"og:image:height\" content=\"1000\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@NAGTalk\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/\",\"url\":\"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/\",\"name\":\"Loss Function and Robustness in Data-Fitting - nAG\",\"isPartOf\":{\"@id\":\"https:\/\/nag.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/finance-four_0.jpeg\",\"datePublished\":\"2021-09-21T12:26:00+00:00\",\"dateModified\":\"2023-08-03T16:45:39+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/#primaryimage\",\"url\":\"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/finance-four_0.jpeg\",\"contentUrl\":\"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/finance-four_0.jpeg\",\"width\":2000,\"height\":1000,\"caption\":\"Data Fitting\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/nag.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Insights\",\"item\":\"https:\/\/nag.com\/insights\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Loss Function and Robustness in Data-Fitting\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/nag.com\/#website\",\"url\":\"https:\/\/nag.com\/\",\"name\":\"NAG\",\"description\":\"Robust, trusted numerical software and computational expertise.\",\"publisher\":{\"@id\":\"https:\/\/nag.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/nag.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/nag.com\/#organization\",\"name\":\"Numerical Algorithms Group\",\"alternateName\":\"NAG\",\"url\":\"https:\/\/nag.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/nag.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/nag.com\/wp-content\/uploads\/2023\/11\/NAG-Logo.png\",\"contentUrl\":\"https:\/\/nag.com\/wp-content\/uploads\/2023\/11\/NAG-Logo.png\",\"width\":1244,\"height\":397,\"caption\":\"Numerical Algorithms Group\"},\"image\":{\"@id\":\"https:\/\/nag.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/NAGTalk\",\"https:\/\/www.linkedin.com\/company\/nag\/\",\"https:\/\/www.youtube.com\/user\/NumericalAlgorithms\",\"https:\/\/github.com\/numericalalgorithmsgroup\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Loss Function and Robustness in Data-Fitting - nAG","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/","og_locale":"en_US","og_type":"article","og_title":"Loss Function and Robustness in Data-Fitting - nAG","og_description":"Fitting a non-linear model to data is typically modelled as a minimisation problem, where the objective function serves as a measurement of the quality of the model\u2019s fit to data, depending on our parameters. A general model involves summing over our data points.","og_url":"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/","og_site_name":"nAG","article_modified_time":"2023-08-03T16:45:39+00:00","og_image":[{"width":2000,"height":1000,"url":"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/finance-four_0.jpeg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_site":"@NAGTalk","twitter_misc":{"Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/","url":"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/","name":"Loss Function and Robustness in Data-Fitting - nAG","isPartOf":{"@id":"https:\/\/nag.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/#primaryimage"},"image":{"@id":"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/#primaryimage"},"thumbnailUrl":"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/finance-four_0.jpeg","datePublished":"2021-09-21T12:26:00+00:00","dateModified":"2023-08-03T16:45:39+00:00","breadcrumb":{"@id":"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/#primaryimage","url":"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/finance-four_0.jpeg","contentUrl":"https:\/\/nag.com\/wp-content\/uploads\/2022\/09\/finance-four_0.jpeg","width":2000,"height":1000,"caption":"Data Fitting"},{"@type":"BreadcrumbList","@id":"https:\/\/nag.com\/insights\/loss-function-and-robustness-in-data-fitting\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/nag.com\/"},{"@type":"ListItem","position":2,"name":"Insights","item":"https:\/\/nag.com\/insights\/"},{"@type":"ListItem","position":3,"name":"Loss Function and Robustness in Data-Fitting"}]},{"@type":"WebSite","@id":"https:\/\/nag.com\/#website","url":"https:\/\/nag.com\/","name":"NAG","description":"Robust, trusted numerical software and computational expertise.","publisher":{"@id":"https:\/\/nag.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/nag.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/nag.com\/#organization","name":"Numerical Algorithms Group","alternateName":"NAG","url":"https:\/\/nag.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/nag.com\/#\/schema\/logo\/image\/","url":"https:\/\/nag.com\/wp-content\/uploads\/2023\/11\/NAG-Logo.png","contentUrl":"https:\/\/nag.com\/wp-content\/uploads\/2023\/11\/NAG-Logo.png","width":1244,"height":397,"caption":"Numerical Algorithms Group"},"image":{"@id":"https:\/\/nag.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/NAGTalk","https:\/\/www.linkedin.com\/company\/nag\/","https:\/\/www.youtube.com\/user\/NumericalAlgorithms","https:\/\/github.com\/numericalalgorithmsgroup"]}]}},"_links":{"self":[{"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/insights\/1440","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/insights"}],"about":[{"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/types\/insights"}],"author":[{"embeddable":true,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/users\/11"}],"version-history":[{"count":1,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/insights\/1440\/revisions"}],"predecessor-version":[{"id":3363,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/insights\/1440\/revisions\/3363"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/media\/1188"}],"wp:attachment":[{"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/media?parent=1440"}],"wp:term":[{"taxonomy":"post-tag","embeddable":true,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/post-tag?post=1440"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}