Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A Limited-Memory BFGS Algorithm Based on a Trust-Region Quadratic Model for Large-Scale Nonlinear Equations

  • Yong Li,

    Affiliation Department of Mathematics, Baise University, Baise, Guangxi, P. R. China

  • Gonglin Yuan ,

    glyuan@gxu.edu.cn

    Affiliation College of Mathematics and Information Science, Guangxi University, Nanning, Guangxi, P.R.China

  • Zengxin Wei

    Affiliation College of Mathematics and Information Science, Guangxi University, Nanning, Guangxi, P.R.China

Abstract

In this paper, a trust-region algorithm is proposed for large-scale nonlinear equations, where the limited-memory BFGS (L-M-BFGS) update matrix is used in the trust-region subproblem to improve the effectiveness of the algorithm for large-scale problems. The global convergence of the presented method is established under suitable conditions. The numerical results of the test problems show that the method is competitive with the norm method.

Introduction

Consider (1) where b:ℜn → ℜn is a system of nonlinear equations and n denotes the large-scale dimension. Problems involving non-linear equations can be found in numerous applications in engineering, such as nonlinear fitting, function approximating, parameter estimating and leaning models [15]. Large-scale nonlinear equations are believed to be difficult to solve because the relations with x are complex and because the dimension is high. Currently, many effective algorithms exist for addressing (1), such as the trust-region method [611], Levenberg-Marquardt method [1214], quasi-Newton method [1519], and Gauss-Newton method [2025]. Let θ be the norm function defined by θ(x)=12b(x)2, and suppose that b(x) has a zero point. Then, (1) is equivalent to the following global optimization problem: (2)

In this paper, we will focus on the solution of (2) via the trust-region (TR) methods. The traditional TR methods, at each iterative point xk, obtain the trial step dk using the following TR subproblem model: (3) where △ > 0 is the so-called TR radius and ‖⋅‖ denotes the Euclidean norm of vectors or their induced matrix. The TR method is very useful when the exact Jacobian or Hessian computation is inexpensive or possible. However, this case is rare in practice. It is not difficult to see that the Jacobian matrix ∇b(x) at every iteration must be computed, which obviously increases the workload and CPU time. Moreover, the global and superlinear convergence of the TR methods often requires that the nondegenerate assumption about ∇b(x*) holds, where x* is a solution of (1) and nondegeneracy means nonsingularity. The nondegeneracy of ∇b(x*) seems to be too stringent of a requirement for the purpose of ensuring the global and superlinear convergence. To overcome the above drawbacks, Yuan et al. [9] presented a new TR subproblem model defined by (4) where the TR radius △k = cpb(xk)‖, c ∈ (0, 1), p is a nonnegative integer and Bk is generated by the following BFGS formula: (5) where sk = xk+1xk, yk = b(xk+1) − b(xk), xk+1 is the next iteration and B0 is an initial symmetric, positive definite matrix. They established the global and superlinear convergence without nondegeneracy. The numerical results show that the given algorithm is competitive with the normal TR method up to a dimension of 600. The subproblem (4) can be also rewritten as follows Yuan et al. [8] consider the case with the symmetric Jacobian matrix ∇b(xk) and propose the following TR subproblem model: where Bk is defined by the special BFGS update: (6) where δk = b(xk + yk) − b(xk). About the above TR methods, the TR subproblems models will be repeatedly computed if the calculated trial step dk does not satisfy the given conditions, which obviously increase the workload. To avoid this drawback, Yuan et al. [10] give the following TR model: where km=max{b(xk),k}. In this model, the next point is xk+1 = xk + dk if the trial step dk satisfies the successful iteration; otherwise, the next point is xk+1 = xk + αk dk, where αk > 0 is the so-called steplength designed by (7) where δ ∈ (0, 1) and the technique was proposed by Yuan and Lu [16]. However, the above model can not ensure that the number of searching αk > 0 is small. Many TR model methods (see [7, 12] etc.) have been developed for solving nonlinear equations. However, these TR methods require the matrix information (the Jacobian matrix or the BFGS update matrix), which will increase the computational complexity. This observation motivate us find another approach to generate the matrix information requiring minimal memory.

Based on the problem (4), we will use the limited-memory BFGS (L-M-BFGS) method instead of the BFGS update because the former often provides a fast rate of linear convergence and requires minimal memory. The L-BFGS method is an adaptation of the standard BFGS method [26]. The implementation is identical to that of the normal BFGS method, the difference between the L-M-BFGS method and the BFGS method is that the inverse Hessian approximation is not explicitly formed, but defined by a small number of BFGS updates. At every iteration xk, the L-M-BFGS method stores a small number (say, m) of correction pairs {si, yi}(i = k − 1, …, km) to obtain Hk+1, instead of storing the matrices Hk, where with Hk being the inverse of Bk. The L-M-BFGS update formula is defined as (8) where ρk=1ykTsk and Vk=IρkykskT. It has been proved that the L-M-BFGS method is very suitable for large-scale systems of nonlinear equations [17]. Therefore, combining the L-M-BFGS update (8) and (4), we design the L-M-BFGS TR subproblem model as follows (9) where △k = cpb(xk)‖.

This paper is organized as follows. In the next section, the algorithm is discussed. In Section 3, the global convergence is established. Finally, the numerical results are reported in Section 4.

The L-M-BFGS Trust-Region Method

The algorithm is given as follows.

  • Algorithm 1.

Initial: Given the constants ρ, c ∈ (0, 1), p = 0, ϵ > 0, x0 ∈ ℜn, H0 ∈ ℜn × ℜn is symmetric and positive definite. Set k: = 0;

Step 1: If ‖bk‖ < ϵ, stop;

Step 2: Solve the TR subproblem (9) with △ = △k to obtain dk;

Step 3: Calculate the following ratio of the actual reduction over the predicted reduction: (1) If rkp<ρ, then we let p = p + 1, go to Step 2. Otherwise, go to Step 4;

Step 4: Set xk+1 = xk + dk, yk = bk+1bk. Use (8) to generate the updated matrix Hk+1 with positive definiteness.

Step 5: Let kk + 1, p = 0, and go to Step 1.

Remark 1. (i) In Algorithm 1, the “Step 2-Step 3-Step 2” procedure is called the inner cycle.

(ii) It is well known that the positive definiteness of the update matrix Hk is very important when analyzing the convergence of the algorithm. Byrd et al. [27] showed that the limited-memory BFGS matrix has this property if the curvature (sk)T yk > 0 is satisfied. Similarly, Powell [28] proposed that yk should be as follows: where θk=0.8skTBkskskTBkskskTyk, Bk is an approximation of ∇b(xk) and Bk=Hk1. Therefore, Step 4 of Algorithm 1 is reasonable.

To compare with the normal TR method, another algorithm is given and we call it Algorithm 2.

  • Algorithm 2.(BFGS Trust-Region Method)

Step 2: Solve the TR subproblem (4) with △ = △k to obtain dk;

Step 4: Set xk+1 = xk + dk, yk = bk+1bk. Use (5) to generate the updated matrix Hk+1=Bk+11 with positive definiteness.

Remark 2. It is easy to see that the difficult step of these two algorithms is the Step 2. The following Dogleg method [29] is used to solve the TR subproblem models (4) and (9) to obtain dk.

  • At the kth iteration:

set ϖk = Hk b(xk);

If ‖ϖk‖ ≤ △k, let dk = −ϖk; Otherwise, set ςk=Hk1b(xk)2b(xk)THk1Hk1Hk1b(xk),ϱk=ςkHk1b(xk),τk=ϖk:

If ςk ≥ 0 and ςk ≤ 1 hold, set dk = ςkϱk; If ςk ≥ 1 and ςk ≤ 2 hold, set dk = ϱk + (ςk − 1)(τk − ϱk).

Convergence Analysis

This section will provide some convergence results under the following assumptions.

Assumption A (i) Let the level set Ω (1) be bounded, and let b(x) be twice continuously differentiable on an open convex set Ω1 containing Ω.

(ii) The relation (2) holds.

Assumption A(i) implies that there exists M1 > 0 such that (3) Similar to Moré [30], Yuan et al. [9], and Yuan and Sun [11], we can obtain the following lemma.

Lemma 0.1 Let the predicted reduction be (4) and let dk be the solution of (9). Then, we have (5)

Proof. For any α ∈ [0, 1], by the property of the solution of (9) named dk, we obtain Thus, the inequality holds. This completes the proof.

Lemma 0.2 Let dk be the solution of (9), and let the actual reduction Aredk(dk) be (6) Suppose that Assumption A holds and that the sequence {xk, dk} is generated by Algorithm 1. Then, the inner cycle of Algorithm 1 will stop in a finite number of steps.

Proof. We will establish this lemma by contradiction. Suppose that the inner cycle of Algorithm 1 infinitely circles at xk, namely, p → ∞, rkp<ρ and cp → 0 hold. It is easy to obtain ‖bk‖ ≥ ϵ, or the algorithm stops. Thus, we can conclude that ‖dk‖ ≤ △k = cpbk‖ → 0 is true.

By the definitions of dk, Aredk(dk), Predk(dk), and Assumption A, we obtain (7) where the second equality follows Taylor’s formula. It follows from Lemma 0.1 and 7) that Thus, if p is sufficiently large, we have rkpρ. This relation contradicts rkp<ρ. The proof is complete.

The above lemma shows that the given algorithm is well-defined.

Theorem 0.1 Let Assumption A hold, and let {xk} be generated by Algorithm 1. Then, {xk} ⊂ Ω, and {θ(xk)} converges. In particular, Algorithm 1 either stops in a finite number of steps or generates an infinite sequence {xk} such that (8)

Proof. By the definition of Algorithm 1, we obtain rkpρ>0. Combining this with Lemma 0.1 generates (9) Thus, {xk} ⊂ Ω holds. Considering θ(xk) ≥ 0, we then conclude that {θ(xk)} converges.

If Algorithm 1 does not stop after a finite number of steps, again by (9) and θ(xk) ≥ 0, we easily conclude that which shows that b(xk) → 0, k → ∞. The proof is complete.

Theorem 0.1 proves that the sequence {xk} of Algorithm 1 converges such that ‖b(xk)‖ → 0 without the assumption that ∇b(x*) is nondegenerate, where x* is a cluster point of {xk}.

Numerical Results

This section will report numerical results obtained using Algorithm 1. The test functions with initial points x0 are listed as follows:

Function 1. Exponential function 2 Initial guess: x0=(1n,1n,,1n)T.

Function 2. Trigonometric function Initial guess: x0=(101100n,101100n,,101100n)T.

Function 3. Singular function Initial guess: x0 = (1, 1, ⋯, 1)T.

Function 4. Logarithmic function Initial guess: x0 = (1, 1, ⋯, 1)T.

Function 5. Broyden Tridiagonal function [[31], pp. 471–472] Initial guess: x0 = (−1, −1, ⋯, −1)T.

Function 6. Trigexp function Initial guess: x0 = (0, 0, ⋯, 0)T.

Function 7. Strictly convex function 1 [[32], p. 29]

b(x) is the gradient of h(x)=i=1n(exixi). Initial guess: x0=(1n,2n,,1)T.

Function 8. Variable dimensional function Initial guess: x0=(11n,12n,,0)T.

Function 9. Discrete boundary value problem [33]. Initial guess: x0 = (h(h − 1), h(2h − 1), ⋯, h(nh − 1)).

Function 10. The discretized two-point boundary value problem that is similar to the problem in [34] when A is the n × n tridiagonal matrix given by and F(x) = (F1(x), F2(x), …, Fn(x))T, with Fi(x) = sin xi − 1, i = 1, 2, …, n, and x = (50, 0, 50, 0, ⋯).

All codes for the experiments were written in MATLAB r2009a and run on a PC with an E5507@2.27 GHz CPU and 2.99 GB of memory using the Windows XP operating system. The parameters were set as ρ = 0.0001, ϵ = 10−5, c = 0.1, γ = 0.7, where B0 and H0 are unit matrices and m = 6. In the inner iterations of Algorithm 1 and Algorithm 2, the trial step will be accepted when p > 5. We also stop the program if the number of iterations is larger than one thousand or when θ(x) < 10−5. The columns of the tables have the following meaning:

Dim
the dimension of the problem.
NG
the number of norm function evaluations.
NI
the total number of iterations.
GN
the normal value of θ(x) when the program stops.
NaN
fails to find the final values of θ(x).

From Table 1, it is clear that the new algorithm performs quite well in terms of solving these problems and that the dimension does not have an obvious influence on the performance of the presented method. Algorithm 2 can also successfully solve some of these problems. However, Algorithm 2 fails to solve problems 3 and 6.

To clearly demonstrate these algorithms’ efficiencies, a tool developed by Dolan and Moré [35] is used to analyze them. In this tool, the so-called (cumulative) distribution function ϱs is defined by where p is a problem, s is its solver, np is the number of problems, ns is the number of existing solvers, P is the test set, κ denotes the spending time or the iteration number, and πp,s is the performance ratio given by where S is the set of solvers. According to the above formulas and the rules given in [35], the performance profile plot of Table 1 is given in the following two figures.

Figs 1 and 2 show that the performance of these methods is a function of NI and NG, respectively, in Table 1. It is clear that the given method performs better because it has a higher probability of being the optimal solver. Fig 1 shows that Algorithm 1 is superior to Algorithm 2 for κ ≤ 14, approximately. From Fig 2, it is clear that the given method can successfully solve 100% of the test problems at κ ≈ 5 and that Algorithm 2 successfully solves the test problems at κ ≈ 17. Overall, the numerical results are interesting.

Supporting Information:Legends of Figs 1 and 2

Dolan and Moré [35] developed a new tool to analyze the efficiency of algorithms. They introduced the notion of a performance profile for evaluating and comparing the performance of the set of solvers S on a test set P. Assuming that ns solvers and np problems exist, for each problem p and solver s, they defined κp,s as the computing time (the number of function evaluations or some other metric) required to solve problem p using solver s.

Requiring a baseline for comparison, they compared the performance on problem p using solver s with the best performance by any solver on this problem, giving the the performance ratio Suppose that a parameter πMπp,s for all p, s is chosen, and πp,s = πM if and only if solver s does not solve problem p.

The performance of solver s on any given problem might be of interest. More importantly, one would like to obtain an overall assessment of the performance of the solver. With this motivation, these authors defined Specifically, ϱs(κ) is the probability for solver sS that a performance ratio πp,s is within a factor κ ∈ ℜ of the best possible ratio. Then, function ϱs is the (cumulative) distribution function for the performance ratio. The performance profile ϱs : ℜ ↦ [0, 1] for a solver is a nondecreasing, piecewise constant function and is continuous from the right at each breakpoint. The value of ϱs(1) is the probability that the solver would outperform the remaining solvers.

According to the above rules, we know that one solver whose performance profile plot is on the top right outperforms the remaining solvers. Similar legends can be found in [10, 3638].

Conclusion

  1. In this paper, a TR model combining with the L-M-BFGS technique is presented for nonlinear equations and the global convergence is established. Fact, this work is an extension of the paper [9] and the difference between these two papers is the update matrix: this paper chooses the L-M-BFGS formula and the paper [9] chooses the normal BFGS formula.
  2. Since the L-M-BFGS technique requires minimal storage and is suitable for certain classes of large scale optimization problems, so we extend it to large scale nonlinear equations. Numerical results turn out the given algorithm is competitive to similar method for large-scale nonlinear equations.
  3. It is well known that the conjugate gradient methods are effective techniques for large-scale optimization problems since they do not need the matrix information, which motivates us to use the conjugate gradient technique for large scale nonlinear equations. We think the numerical performance is also very interesting.

Acknowledgments

This work is supported by the QFRC visitor funds of the University of Technology Sydney, study abroad funds of Guangxi talents in China, Guangxi Colleges and Universities Key Laboratory of Mathematics and Its Applications, the Program for Excellent Talents in Guangxi Higher Education Institutions (Grant No. 201261), Guangxi Higher Education Institutions (NO.YB2014389), the Guangxi NSF (Grant No. 2012GXNSFAA053002) and the China NSF (Grant No. 11261006 and 11161003). The authors thank the referees for their valuable comments which greatly improve our paper.

Author Contributions

Conceived and designed the experiments: YL GY ZW. Performed the experiments: YL GY ZW. Analyzed the data: YL GY ZW. Contributed reagents/materials/analysis tools: YL GY ZW. Wrote the paper: GY YL ZW.

References

  1. 1. Gu B and Sheng VS. Feasibility and finite convergence analysis for accurate on-line v-support vector learning. IEEE Transactions on Neural Networks and Learning Systems, 2013,24(8):1304–1315.
  2. 2. Li J,Li XL,Yang B,Sun XM. Segmentation-based Image Copy-move Forgery Detection Scheme, IEEE Transactions on Information Forensics and Security, 2015,10(3):507–518.
  3. 3. Wen XZ,Shao L, Fang W, and Xue Y. Efficient Feature Selection and Classification for Vehicle Detection. IEEE Transactions on Circuits and Systems for Video Technology, 2015.
  4. 4. Zhang H, Wu J, Nguyen TM, Sun MX. Synthetic Aperture Radar Image Segmentation by Modified Student’s t-Mixture Model, IEEE Transaction on Geoscience and Remote Sensing, 2014,vol. 52, no. 7, pp. 4391–4403.
  5. 5. Fu ZJ. Achieving Efficient Cloud Search Services: Multi-keyword Ranked Search over Encrypted Cloud Data Supporting Parallel Computing, IEICE Transactions on Communications, Vol.E98-B, No. 1, pp.190–200.
  6. 6. Conn AR, Gould NIM, and Toint PL, Trust region method, Society for Industrial and Applied Mathematics, Philadelphia PA, 2000.
  7. 7. Yuan Y, Trust region algorithm for nonlinear equations, Information, 1 (1998), pp. 7–21.
  8. 8. Yuan G, Lu X, and Wei Z, BFGS trust-region method for symmetric nonlinear equations, Journal of Computational and Applied Mathematics, 230 (2009), pp. 44–58.
  9. 9. Yuan G, Lu X, and Wei Z, A BFGS trust-region method for nonlinear equations, Computing, 92 (2011), pp. 317–333.
  10. 10. Yuan G, Lu S, and Wei Z, A new trust-region method with line search for solving symmetric nonlinear equations, International Journal of Computer Mathematics, 88 (2011), pp. 2109–2123.
  11. 11. Yuan Y and Sun W, Optimization theory and methods, Science Press, Beijing, 1997.
  12. 12. Fan JY, A modified Levenberg-Marquardt algorithm for singular system of nonlinear equations, Journal of Computational Mathematics, 21 (2003), pp. 625–636.
  13. 13. Yamashita N and Fukushima M, On the rate of convergence of the Levenberg-Marquardt Method, Computing, 15 (2001), pp. 239–249.
  14. 14. Zhang J and Wang Y, A new trust region method for nonlinear equations, Mathematical Methods of Operations Research, 58 (2003), pp. 283–298.
  15. 15. Griewank A, The ‘global’ convergence of Broyden-like methods with a suitable line search, Journal of the Australian Mathematical Society, Series B, 28 (1986), pp. 75–92.
  16. 16. Yuan G and Lu X, A new backtracking inexact BFGS method for symmetric nonlinear equations, Computers and Mathematics with Applications, 55 (2008) pp. 116–129.
  17. 17. Yuan G, Wei Z, and Lu S, Limited memory BFGS method with backtracking for symmetric nonlinear equations, Mathematical and Computer Modelling, 54 (2011), pp. 367–377.
  18. 18. Yuan G and Yao S, A BFGS algorithm for solving symmetric nonlinear equations, Optimization, 62 (2013), pp. 85–99.
  19. 19. Zhu D, Nonmonotone backtracking inexact quasi-Newton algorithms for solving smooth nonlinear equations, Applied Mathematics and Computation, 161 (2005), pp. 875–895.
  20. 20. Bertsekas DP, Nonlinear programming, Athena Scientific, Belmont, 1995.
  21. 21. Gu G, Li D, Qi L, and Zhou S, Descent directions of quasi-Newton methods for symmetric nonlinear equations, SIAM Journal on Numerical Analysis, 40 (2002), pp. 1763–1774.
  22. 22. Levenberg K, A method for the solution of certain nonlinear problem in least squares, Quarterly of Applied Mathematics, 2 (1944), pp. 164–166.
  23. 23. Li D and Fukushima M, A global and superlinear convergent Gauss-Newton-based BFGS method for symmetric nonlinear equations, SIAM Journal on Numerical Analysis, 37 (1999), 152–172.
  24. 24. Marquardt DW, An algorithm for least-squares estimation of nonlinear inequalities, SIAM Journal on Applied Mathematics, 11 (1963), pp. 431–441.
  25. 25. Nocedal J and Wright SJ, Numerical optimization, Spinger, New York, 1999.
  26. 26. Byrd RH, Nocedal J, Schnabel RB, Representations of quasi-Newton matrices and their use in limited memory methods, Math. Prgram., 63 (1994), pp. 129–156.
  27. 27. Byrd RH, Lu PH, and Nocedal J, A limited memory algorithm for bound constrained optimization, SIAM J. Statist. Sci. Comput., 16 (1995), pp. 1190–1208.
  28. 28. Powell MJD, A fast algorithm for nonlinearly constrained optimization calculations, Numer. Anal., (1978), pp. 155–157.
  29. 29. Wang YJ and Xiu NH, Theory and algoithm for nonlinear programming, Shanxi Science and Technology Press, Xian, 2004.
  30. 30. Moré JJ, Recent development in algorithm and software for trust region methods, in: Bachem A., Grotschel M., Kortz B. (Eds.), Mathematical Programming: The State of the Art, Spinger-Verlag, Berlin, 1983, 258–285.
  31. 31. Gomez-Ruggiero M, Martinez JM, and Moretti A, Comparing algorithms for solving sparse nonlinear systems of equations, SIAM Journal on Scientific and Statistical Computing, 23 (1992), pp. 459–483.
  32. 32. Raydan M, The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem, SIAM Journal on Optimization, 7 (1997), pp. 26–33.
  33. 33. Moré JJ, Garbow BS, and Hillström KE, Testing unconstrained optimization software, ACM Transactions on Mathematical Software, 7 (1981), pp. 17–41.
  34. 34. Ortega JM and Rheinboldt WC, Iterative solution of nonlinear equations in several variables, Academic Press, New York, 1970.
  35. 35. Dolan ED and Moré JJ, Benchmarking optimization software with performance profiles, Mathematical Programming, 91 (2002), pp. 201–213.
  36. 36. Yuan G and Lu X, A modified PRP conjugate gradient method, Annals of Operations Research, 166 (2009), pp. 73–90.
  37. 37. Yuan G, Lu X, and Wei Z, A conjugate gradient method with descent direction for unconstrained optimization, Journal of Computational and Applied Mathematics, 233 (2009), pp. 519–530.
  38. 38. Yuan G and Zhang M, A modified Hestenes-Stiefel conjugate gradient algorithm for large-scale optimization, Numerical Functional Analysis and Optimization, 34 (2013), pp. 914–937.