Exercises with the symbol \(^+\) are to be done at home before the class. Exercises with the symbol \(^*\) will be tackled in class. The remaining exercises are left for self training after the exercise class. Some exercises are from the text book and the number is reported. They have the solution at the end of the book.
using both the transform \(x = t_{ a,b} ( \hat{x} )\) and a sigmoid transform for constraint bounds \(x \in [ a, b]\): \[x=s(\hat{x})=a+\frac{(b-a)}{1+\exp{(-\hat{x})}}.\]
Why is the \(t_{a,b}\) transform better in this case than the sigmoid transform?
The problem is minimized at \(x^*=1\), which is at the constraint boundary. Solving with the \(t\)-transform yields the unconstrained objective function: \[f_t(\hat{x})= \sin \left(\frac{4}{5.5 +4.5\frac{ 2\hat{x}}{1+\hat{x}^2}}\right)\] which has a single global minimum at \(\hat{x}=-1\), correctly corresponding to \(x^*\).
The sigmoid transform has an unconstrained objective function: \[f_s(\hat{x})=\sin \left(\frac{4}{1 + \frac{9}{1+\exp(-\hat{x})}}\right)\] Unfortunately, the lower-bound \(a\), in this case \(x = 1\), is reached only as \(\hat{x}\) approaches minus infinity. The unconstrained optimization problem obtained using the sigmoid transform does not have a solution.
using the quadratic penalty method with \(\rho > 0\). Solve the problem in closed form.
First reformulate the problem as \(f (x) = x +\rho \max(-x, 0)^2\) for which the derivative is \[f'(x) =\begin{cases}
1 +2\rho x &\text{if $x < 0$}\\
1 &\text{otherwise}
\end{cases}\] This unconstrained objective function can be solved by setting \(f'(x) = 0\), which yields the solution \(x^*=-1/2\rho\). Thus, as \(\rho \to \infty\) we have that \(x^*\to 0\).
3\(^*\) (10.11)
Suppose we want to minimize \(x_1^3 + x_2^2 + x_3\) subject to the constraint that \(x_1 + 2x_2 + 3x_3 = 6\). How might we transform this into an unconstrained problem with the same minimizer?
Solve the transformed problem numerically.
We can rearrange the constraint in terms of \(x_1\): \[x_1 = 6-2x_2-3x_3\] and substitute the relation into the objective: \[\underset{x_2,x_3}\min x_2^2 +x_3-(2x_2 +3x_3-6)^3\]
Optimization with the penalty method typically involves running several optimizations with increasing penalty weights. Impatient engineers may wish to optimize once using a very large penalty weight. Explain what issues are encountered for both the count penalty method and the quadratic penalty method.
Implement the method and solve the problem numerically.
The transformed objective function is \(f (x) = 1-x^2 +\rho p(x)\), where \(p\) is either a count penalty or a quadratic penalty: \[p_{count}(x) = (|x|> 2) \quad p_{quadratic}(x) = \max(|x|-2, 0)^2\] The count penalty method does not provide any gradient information to the optimization process. An optimization algorithm initialized outside of the feasible set will be drawn away from the feasible region because \(1-x^2\) is minimized by moving infinitely far to the left or right from the origin. The large magnitude of the count penalty is not the primary issue; small penalties can lead to similar problems.
The quadratic penalty method does provide gradient information to the optimization process, guiding searches toward the feasible region. For very large penalties, the quadratic penalty method will produce large gradient values in the infeasible region. In this problem, the partial derivative is: \[f'(x)=-2x +\rho
\begin{cases}
2(x-2) &\text{if $x > 2$}\\
2(x +2) &\text{if $x <-2$}\\
0 &\text{otherwise}
\end{cases}\] For very large values of \(\rho\), the partial derivative in the infeasible region is also large, which can cause problems for optimization methods. If \(\rho\) is not large, then infeasible points may not be sufficiently penalized, resulting in infeasible solutions.
5\(^*\)
Verify the numerical results of Example 10.3 and 10.4 (pages 172-173) of the text book.
Classifying problems and deciding which method among those studied in this course can be used to address them is a skill that will be tested at the oral exam. Hence, questions like this are to be expected.
Example 10.3 It can be approached in two ways:
as a univariate optimization problem with the given smooth derivative
as a root finding problem on the derivative function
In Python:
import numpyimport scipy as spimport matplotlib.pyplot as pltdef func(x: numpy.array):return-numpy.exp(-(x[0]*x[1] -3/2)**2-(x[1]-3/2)**2)def func1(x: float): _x=numpy.array([x**2, x])return func(_x)def f1prime(x: float):return-6* func1(x) * (x**5-3/2*x**2+1/3*x-1/2)opt = sp.optimize.minimize_scalar(func1, method='golden')print(opt)root = sp.optimize.root_scalar(f1prime, x0=0, method='secant')print(opt)# Generate x valuesx = numpy.linspace(-2, 2, 100)y = func1(x)# Create the plotplt.figure(figsize=(6, 4))plt.plot(x, y, label=r"$f(x)$", color="blue")plt.xlabel("x")plt.ylabel("f(x)")plt.title("Plot of $f(x)$")plt.axhline(0, color='black', linewidth=0.5)plt.axvline(0, color='black', linewidth=0.5)plt.legend()plt.grid(True)# Show the plotplt.show()#plt.savefig("ex10_3.png")
message:
Optimization terminated successfully;
The returned value satisfies the termination criteria
(using xtol = 1.4901161193847656e-08 )
success: True
fun: -0.8879747422664449
x: 1.1652486100083423
nit: 38
nfev: 43
message:
Optimization terminated successfully;
The returned value satisfies the termination criteria
(using xtol = 1.4901161193847656e-08 )
success: True
fun: -0.8879747422664449
x: 1.1652486100083423
nit: 38
nfev: 43
Example 10.4
This problem corresponds to solving a system of nonlinear equations with the same number of equalities conditions as number of variables (if more variables than equations, we need to fix some variables). There is no need to optimize. In general, the system may have no solutions, a unique solution, or many solutions.
Netwon’s method
A Method to solve this problem is by reducing it to the solution of a system of linear equations.
Let \(r_1({\vec x})=0, ..., r_m({\vec x})=0\) be the non linear equations and \(\vec
r({\vec x})= \{r_1({\vec x}), ..., r_m({\vec x}) \}\) be the set of left hand sides of these equations. We aim at finding a direction \(\vec p\) to move \({\vec x}_k\) towards
Starting from the Taylor expansion: \[\vec r({\vec x}+ \vec p) = \vec r({\vec x}) + \int_0^1 J({\vec x}+t\vec p) \vec p dt\] We can define a linear model \(M_k (p)\) of \(\vec r({\vec x}_k +\vec p)\) by approximating the right hand side: \[M_k(\vec p)\stackrel{def}= r({\vec x}_k ) + J({\vec x}_k ) p\]
#| label: lst-newton#| lst-cap: "Newton's Method for Nonlinear Equations"Algorithm: Newton's Method1. Choose x_02. For k = 0, 1, 2, ... a. Solve J(x_k) p_k = -r(x_k) b. x_{k+1} = x_k + p_k
When the iterate \({\vec x}_k\) is close to a nondegenerate root \({\vec x}^*\), Newton’s method converges superlinearly.
Potential shortcomings of the method include the following:
When the starting point is remote from a solution, Algorithm can behave erratically. When \(J({\vec x}_k )\) is singular, the Newton step may not even be defined.
First-derivative information (the Jacobian matrix \(J\)) may be difficult to obtain.
It may be too expensive to find and calculate the Newton step pk exactly when n is large.
The root \({\vec x}^*\) in question may be degenerate, that is, \(J({\vec x}^*)\) may be singular.
Broyden’s method
Secant methods, also known as quasi-Newton methods, do not require calculation of the Jacobian \(J({\vec x})\). Instead, they construct their own approximation to this matrix, updating it at each iteration so that it mimics the behavior of the true Jacobian \(J\) over the step just taken. The approximate Jacobian, which we denote at iteration \(k\) by \(B_k\), is then used to construct a linear model analogous to the one above, namely
#| label: lst-broyden#| lst-cap: "Broyden's Method"Algorithm: Broyden's Method1. Choose x_0 and a nonsingular initial Jacobian approximation B_02. For k = 0, 1, 2, ... a. Solve B_k p_k = -r(x_k) for p_k b. Choose alpha_k by performing a line search along p_k c. x_{k+1} = x_k + alpha_k * p_k d. s_k = x_{k+1} - x_k e. y_k = r(x_{k+1}) - r(x_k) f. Obtain B_{k+1} from the update formula
Under certain assumptions, Broyden’s method converges superlinearly, that is, \[\|{\vec x}_{k+1}-x^*\|=o(\|{\vec x}_k-{\vec x}^*\|)\]
An initial guess for the Jacobian is \((-1/\alpha)\) or the computed Jacobian.
Implementations in scipy: root(method='broyden1'):
message: The maximum number of iterations allowed has been reached.
success: False
status: 2
fun: [-1.956e-03 -3.863e-03 3.962e-03]
x: [ 8.307e-01 -9.136e-01 2.030e-03]
nit: 8000
method: broyden1
nfev: 16010
[ 0.83067483 -0.91358452 0.00202981]
[-0.00195573 -0.0038628 0.00396184]
message: xtol=0.000000 is too small, no further improvement in the approximate
solution is possible.
success: False
status: 3
fun: [-3.053e-16 5.551e-17 0.000e+00]
x: [ 1.358e+00 1.165e+00 1.701e-01]
method: hybr
nfev: 43
fjac: [[-6.240e-01 -6.789e-01 3.870e-01]
[-7.383e-02 -4.419e-01 -8.940e-01]
[-7.779e-01 5.864e-01 -2.256e-01]]
r: [-2.584e+00 -3.006e+00 -2.467e+00 -3.908e+00 -1.652e+00
2.316e+00]
qtf: [ 1.528e-16 -1.988e-18 2.701e-16]
[1.35780432 1.16524861 0.1700643 ]
[-0. 0. 0.]
The Broyden method was not successful. The method ‘hybr’ uses a modification of the Powell hybrid method as implemented in MINPACK and perfromed better. However, results were highly dependent on the initial point.
We should now proceed to test the KKT conditions.
6\(^*\)
Derive the Karush-Kuhn-Tucker conditions (FONCs) for constrained maximization problems.
In an optimal solution \({\vec x}^*\), the gradient has to stay in the tangent cone of the active constraints. In the minimization case, imposing \(\vec \mu\geq \vec 0\) we derived: \[\nabla f({\vec x}^*)=-\vec \mu \cdot \nabla \vec g({\vec x}^*) - \vec \lambda \cdot \nabla \vec h({\vec x}^*)\] In a maximization case, maintaining \(\vec \mu\geq \vec 0\) we derive: \[\nabla f({\vec x}^*)=\vec \mu \cdot \nabla \vec g({\vec x}^*) - \vec \lambda \cdot \nabla \vec h({\vec x}^*)\] The \(\vec\lambda\) multipliers are free (\(\vec \lambda \in \Re\)) hence we do not need to change sign, because they can adapt. Hence, for maximization there is a change of sign only in front of the multipliers \(\vec \mu\) in the gradient condition: \[\begin{cases}
\nabla f({\vec x}^*)-\vec \mu \cdot \nabla \vec g({\vec x}^*) + \vec \lambda \cdot \nabla \vec h({\vec x}^*)=0
\vec h({\vec x}^*)=\vec 0\\
\vec \mu \cdot \vec g({\vec x}^*)=0\\
\vec g({\vec x}^*)\leq \vec 0, \; \vec h({\vec x}^*)=\vec 0\\
\vec \mu\geq \vec 0
\end{cases}\]
Consider the following constrained optimization problem: \[\min\;\left(x_1-\frac{3}{2}\right)^2+\left(x_2-\frac 12\right)^4 \quad \text{s.t.}\quad
\begin{bmatrix}
1-x_1-x_2\\
1-x_1+x_2\\
1+x_1-x_2\\
1+x_1+x_2\\
\end{bmatrix}\geq 0\] Plot the problem and determine the optimal solution reasoning on the plot. Then, show that in the optimal solution the KKT conditions hold.
import numpy as npimport matplotlib.pyplot as plt# objective functionf =lambda x1, x2: (x1 -1.5)**2+ (x2 -0.5)**4# grid for contour plotx1 = np.linspace(-1.5, 2.5, 400)x2 = np.linspace(-1.5, 2.0, 400)X1, X2 = np.meshgrid(x1, x2)Z = f(X1, X2)# constraints as lines: each line is equality for one half-spacelines = {"c1: $1-x_1-x_2=0$": lambda x: 1- x,"c2: $1-x_1+x_2=0$": lambda x: x -1,"c3: $1+x_1-x_2=0$": lambda x: x +1,"c4: $1+x_1+x_2=0$": lambda x: -1- x,}plt.figure(figsize=(6, 6))contours = plt.contour(X1, X2, Z, levels=25, cmap="viridis", alpha=0.8)plt.clabel(contours, inline=True, fontsize=8, fmt="%.2f")# feasible polygon verticesfeasible = np.array([ [1.0, 0.0], [0.0, 1.0], [-1.0, 0.0], [0.0, -1.0],])plt.fill(feasible[:, 0], feasible[:, 1], facecolor="tab:blue", alpha=0.12, label="feasible region")# plot constraint linesx_plot = np.linspace(-1.5, 2.5, 200)for label, line_fun in lines.items(): plt.plot(x_plot, line_fun(x_plot), label=label)# optimal pointx_opt = np.array([1.0, 0.0])plt.scatter(*x_opt, color="red", zorder=5)plt.text(x_opt[0] +0.05, x_opt[1] -0.15, r"$(1,0)$", color="red")plt.xlim(-1.5, 2.5)plt.ylim(-1.5, 2.0)plt.xlabel(r"$x_1$")plt.ylabel(r"$x_2$")plt.title(r"Contour plot of $f(x_1,x_2)$ and feasible region")plt.legend(loc="upper right", fontsize=8)plt.grid(alpha=0.3)plt.show()
The solution is \({\vec x}^*= (1,0)^T\). The first and second constraints are active at this point. Denoting them by \(c_1\) and \(c_2\) (and the inactive constraints by \(c_3\) and \(c_4\)), we have \[\nabla f({\vec x}^*)=\begin{bmatrix}-1\\ -\frac 12
\end{bmatrix},
\nabla c_1=\begin{bmatrix}
-1\\
-1
\end{bmatrix},
\nabla c_2=\begin{bmatrix}
-1\\
1
\end{bmatrix}.\] Therefore, the KKT conditions are satisfied when we set \[\lambda^*= \begin{bmatrix}\frac 34, \frac 14, 0, 0\end{bmatrix}^T.\]
If we hold \(\lambda_1\) fixed, this is a convex function of \([x-1, x_2]\). Therefore, the infimum with respect to \([x_1, x_2]\) is achieved when the partial derivatives with respect to \(x_1\) and \(x_2\) are zero, that is,
\[x_1-\lambda_1= 0, \quad x_2= 0.\]
By substituting these infimal values into \({\cal L}(x_1, x_2,\lambda_1)\) we obtain the dual objective: