Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A Conjugate Gradient Algorithm with Function Value Information and N-Step Quadratic Convergence for Unconstrained Optimization

  • Xiangrong Li,

    Affiliation Guangxi Colleges and Universities Key Laboratory of Mathematics and Its Applications, College of Mathematics and Information Science, Guangxi University, Nanning, Guangxi, P.R. China

  • Xupei Zhao,

    Affiliation Mathematics and Physics Institute of Henan University of Urban Construction, Pingdingshan, Henan, P. R. China

  • Xiabin Duan,

    Affiliation Guangxi Colleges and Universities Key Laboratory of Mathematics and Its Applications, College of Mathematics and Information Science, Guangxi University, Nanning, Guangxi, P.R. China

  • Xiaoliang Wang

    xliangwang@126.com

    Affiliation Guangxi Colleges and Universities Key Laboratory of Mathematics and Its Applications, College of Mathematics and Information Science, Guangxi University, Nanning, Guangxi, P.R. China

Abstract

It is generally acknowledged that the conjugate gradient (CG) method achieves global convergence—with at most a linear convergence rate—because CG formulas are generated by linear approximations of the objective functions. The quadratically convergent results are very limited. We introduce a new PRP method in which the restart strategy is also used. Moreover, the method we developed includes not only n-step quadratic convergence but also both the function value information and gradient value information. In this paper, we will show that the new PRP method (with either the Armijo line search or the Wolfe line search) is both linearly and quadratically convergent. The numerical experiments demonstrate that the new PRP algorithm is competitive with the normal CG method.

Introduction

Consider (1) where f(x) : ℜn → ℜ is a continuously differentiable function. This optimization problem can be used to model other problems (see [15]). The iterative processes of the conjugate gradient (CG) methods are determined by (2) where xk is the k-th iteration; αk is the step length, which is determined by a line search method; and dk is the search direction defined by (3) where gk = g(xk) is the gradient ∇f(xk) of f(x) at the point xk and βk is a scalar. In addition, the parameter βk varies between different CG formulas (see [617]etc.). This paper focuses on the PRP method designed by [13, 14] (4) where gk−1 is the gradient ∇f(xk−1) of f(x) at the point xk−1, and ‖.‖ denotes the Euclidean norm of vectors. Recently, there have been several successful research projects involving the PRP algorithm (see [7, 1829] for details). The linear convergence of the standard PRP method is well known. If a restart strategy is used, the algorithm can be n-step quadratic convergent [3032]; however, the quadratically convergent results are very limited. Zhang et al. [33] presented a three-term CG algorithm by modifying the PRP (MPRP) method and proved its global convergence. Moreover, Li and Tian [34] proved that the MPRP method with a restart strategy exhibited quadratic convergence under certain inexact line searches and suitable assumptions. In this paper, we focus primarily on the MPRP method. On the basis of the MPRP method [33], a new three-term CG formula is proposed in which the formula includes not only the gradient information but also the function information. By applying the conclusion of [34], we prove that the new CG method obtains global convergence for general functions and n-step quadratic convergence for uniformly convex functions. The numerical results show that the new method performs quite well. The main attributes of this method are stated below.

  • The new PRP CG method possesses not only the gradient information but also information about function values.
  • The sufficient descent property is satisfied without any conditions.
  • The global convergence, linear convergent rate, and n-step quadratic convergence are established.
  • The numerical results demonstrate that this method is competitive with the normal CG method for the given problems.

This paper is arranged as follows. In Section 2, we review our motivation, review the MPRP method of [33], and introduce our new PRP method. The r-step linear convergence of the new PRP method with the standard Armijo line search is proven in Section 3. In Section 4, we introduce a strategy for choosing the initial step length using the Armijo or Wolfe line search and provide the steps of the new PRP method. In Section 5, the n-step quadratic convergence of the given CG algorithm is established. In Section 6, various numerical experiments are performed to test the n-step quadratic convergence of the new PRP algorithm, and its performance is compared with that of the normal CG method. The conclusion is presented in the last section.

Motivation and Algorithm

In this section, we will provide our motivation based on the MPRP method [33] and the BFGS formulas.

The MPRP method of [33]

Zhang et al. [33] proposed a modified PRP method (a three-term CG method) called the MPRP method. The search direction of the MPRP method is defined by (5) where yk−1 = gkgk−1, (6) This method can guarantee that dk provides a descent direction of f at xk, namely, dk satisfies Because this method is regarded as the PRP method when the exact line search technique is used, we call it the MPRP method. This method possesses global convergence with the Armijo line search but fails using the Wolfe line search because the search direction does not belong to a trust region. The numerical results thus show that this method is effective.

Motivation based on the BFGS formula and the MPRP method

The BFGS method is one of the most effective methods for unconstrained optimization problems. Many satisfactory results concerning this method can be found in the literature (see [3543] etc.), wherein Wei et al. [43] presented a new BFGS update generated by Taylor’s formula with and ρk−1 = 2[fk−1fk] + (gk + gk−1)T sk−1; Bk is an approximation of the Hessian matrix of the objective function f, with B0 = I. Clearly, the formula contains not only gradient value information but also function value information. Then, it might be argued that the resulting methods will substantially outperform the original method in theory. Practical computations demonstrate that this method is better than the normal BFGS method [43, 44]. Based on this method, to obtain the global convergence and the superlinear convergence of generally convex functions, Yuan and Wei [45] proposed a new quasi-Newton equation with where and (7) can ensure that Bk inherits the positive definiteness of Bk−1 for generally convex functions; the numerical results show that this method is better than the normal CG method.

Motivated by the above discussions and the CG Eq (5), the modified PRP formula is used to replace yk−1 by in Eq (5). Specifically, the new CG formula is defined by (8) where (9) We call this the MPRP* formula. Using (8), we can obtain (10) and (11)

Linear convergence of the MPRP* method

The Armijo line search finds a step length αk = max{ρii = 0, 1, 2, ⋯} that satisfies (12) where ρ ∈ (0,1) and . The Wolfe line search conditions are (13) where and σ1 < σ2 < 1.

To establish the r-linear convergence of the MPRP* method, we make the following assumptions.

Assumption (i) f is a twice continuously differentiable, uniformly convex function, which means that there are positive constants Mm > 0 such that where ∇2 f(x) denotes the Hessian of f at x.

Under Assumption (i), it is not difficult to obtain (14) and (15) where x* denotes the unique solution of the problem (1). Based on Assumption (i), we can easily obtain the following lemma and thus omit its proof.

Lemma 0.1 Let Assumption (i) hold, and let the sequence {xk} be generated by the MPRP* method with the Armijo or Wolfe line search. We have where and . Moreover, when the Wolfe line search is used, we obtain

Lemma 0.2 Let Assumption (i) hold, and let the sequence {xk} be generated by the MPRP* method with the Armijo line search or the Wolfe line search. Then, there is a constant c > 0 satisfying (16)

Proof. Let where Using the definition of we obtain

Case 1: If ρk−1 ≤ 0, we have .

Using the mean-value theorem, we have and

Moreover, we obtain Thus, it follows from (8) and the Lipschitz continuity of g that

Case 2: If ρk−1 > 0, we have By and we obtain Then, we have and Therefore, it follows from (8) and the Lipschitz continuity of g that (17) If the Armijo line search is used, and αk ≠ 1, then such that By the mean-value theorem and the above relation, we can deduce that there exists a scalar μk ∈ (0,1) satisfying Thus, the relation holds. By the relation , we can find that Then, we obtain Setting , we have Eq (16). If the Wolfe line search is used, using Eq (13), we have By analyzing the Armijo line search technique in a similar manner, we can find a lower positive bound of the step size αk. The proof is complete.

Similar to [34], we can establish the following theorem. Here, we state the theorem below but omit its proof.

Theorem 0.1 Let Assumption (i) hold, and let the sequence {xk} be generated by the MPRP* method with the Armijo line search technique or Wolfe line search technique. Then, there are constants a > 0 and r ∈ (0,1) that satisfy (18)

The restart MPRP* method and its convergence

As with [34], we define an initial step length as follows: (19)

using the integer sequence {ϵk} → 0 with k → ∞. Moreover, we can obtain

Theorem 0.2 Let sequence {xk} be generated by the MPRP* method, and let Assumption (i) hold. Then, for sufficiently large k, satisfies the Armijo line search and the Wolfe-Powell line search conditions.

Theorem 0.2 shows that, for sufficiently large k, can be defined by Eq (19) such that the Armijo line search and Wolfe-Powell line search conditions are satisfied. In the following, ∣γk∣ is used as the initial step length of the restart MPRP* method.

Algorithm 4.1 (RMPRP*)

Step 0: Given , ρ ∈ [0,1), ϵ ∈ [0,1), and γ > 0 is an integer, let k: = 0.

Step 1: If ‖gk‖ ≤ ϵ, stop.

Step 2: If the inequality holds, we set αk = ∣γk∣; otherwise, we determine αk = max{∣γkρjj = 0, 1, 2, ⋯} satisfying (20)

Step 3: Let , and k: = k+1.

Step 4: If ‖gk‖ ≤ ϵ, stop.

Step 5: If k = γ, we let x0: = xk. Go to step 1.

Step 6: Compute using (8). Go to step 2.

We now establish the global convergence of Algorithm 4.1.

Theorem 0.3 Let the conditions of Theorem 0.2 hold. Then, the following relation (21) holds.

Proof. We will prove this theorem by contradiction. Suppose that Eq (21) does not hold; then, for all k, there exists a constant ɛ1 > 0 satisfying (22) Using Eqs (10) and (20), if f is bounded from below, we can obtain (23) In particular, we have (24) If limk → ∞ αk > 0, by Eqs (10) and (24), we obtain limk → ∞gk‖ = 0. This contradicts Eqs (22) and (21) holds. Otherwise, if limk → ∞ αk = 0., then there is an infinite index set K0 satisfying (25) According to Step 2 of Algorithm 4.1, when k is sufficiently large, αk ρ−1 does not satisfy Eq (20), which implies that (26) By Eq (22), similar to the proof of Lemma 2.1 in [33], we can deduce that there is a constant ϱ > 0 such that (27) Using Eqs (27) and (10), and the mean-value theorem, we have where ξ0 ∈ (0,1) and the last inequality follows Eq (15). Combining this result with Eq (26), for all sufficiently large kK0, we obtain By Eq (27) and limk → ∞ αk = 0, the above inequality then implies that limkK0, k → ∞gk‖ = 0. This is also a contradiction. The proof is complete.

Lemma 0.3 Let Assumption (i) hold, and let the sequence {xk} be generated by the RMPRP* method. Then, there are four positive numbers ci, i = 1, 2, 3, 4 that satisfy (28)

Proof. Considering the first inequality of Eq (28), we have (29) where . Using we will discuss the three other inequalities of Eq (28). For , we have

Case 1: If ρk ≤ 0, then . Therefore, we have where and the second inequality follows from the mean-value theorem and Eq (29).

Case 2: If ρk > 0, then we have

Thus, we have

Let . We obtain and

Using the definition of , we obtain

Case 1: If ρk ≤ 0, we have

Case 2: If ρk > 0, we have and Letting , we obtain . The proof is complete.

Assumption (ii) In some neighborhood N of x*, ∇2 f is Lipschitz continuous.

Similar to [34], we can also establish the n-order quadratic convergence of the RMPRP* method. Here, we state the theorem but omit the proof.

Theorem 0.4 Let Assumptions (i) and (ii) hold. Then, there exists a constant c′ > 0 such that (30) Specifically, the RMPRP* method is quadratically convergent.

Numerical Results

This section reports on various numerical experiments with Algorithm 4.1 and the normal algorithm to demonstrate the effectiveness of the given algorithm. We utilize the normal algorithm (called Algorithm N), whereby the algorithm does not use the restart technique, but the other steps are the same as those in Algorithm 4.1. We will test the following benchmark problems using these two algorithms. These problems are listed in Table 1. These benchmark problems and discussions concerning the choice of tested problems for an algorithm can be found at

thumbnail
Table 1. Definition of the benchmark problems and their features.

https://doi.org/10.1371/journal.pone.0137166.t001

We will use these benchmark problems to perform the experiments with these two algorithms. The code was written in MATLAB 7.6.0 and run on a PC with a Core 2 Duo CPU, E7500 @2.93 GHz, 2.00 GB memory and Windows XP operation system. The parameters of the algorithms are chosen to be ρ = 0.1, γ = 10, σ1 = 0.0001, and γk = e1 = ϵ = 10−4. The dimension can be found in Tables 23. Because the line search cannot always ensure that the descent condition holds, uphill searches will occur in the experiments. To avoid this situation, the step size αk will be accepted if the searching number is larger than ten in every line search technique. The Himmeblau stop rule will be used:

If ∣f(xk)∣ > e1, set otherwise, set stop1 = ∣f(xk)−f(xk+1)∣.

For each problem, the program will stop if ‖g(x)‖ < e3 or stop1 < e2 is satisfied, where e1 = e2 = e3 = 10−4.

The program also stops if more than five thousand iterations are performed because the corresponding method for the problem is regarded as having failed. The meanings of the columns in Tables 23 are as follows:

x0: the initial point NFG: the total number of NF and NG, i.e., NFG = NF+NG.

NI: the total number of iterations Dim: the dimension of the problem;

denotes the function value at the point when the program stops.

The results in Tables 23 show that these two algorithms effectively solve the benchmark problems—except for the fourth problem. In the experiment, the results when the algorithms are applied to problems 2 and 3 are not satisfactory when the dimensions of the problem are large; thus, we use a lower dimension. The dimensions of problems 8 and 9 are less than 1,000; the reasoning is similar to that for problems 2 and 3. However, the dimensions of all the problems are larger than 30, which is fixed. For many problems, the results are similar, except for those of problem 6. Clearly, the restart algorithm is competitive with the normal algorithm without using the restart technique.

To clearly to show the performance of Algorithm 4.1 and Algorithm N on NFG, we use the tool (S1 File) in [46] to analyze the algorithms. The results are listed in Table 4. It is easy to see that Algorithm 4.1 outperforms Algorithm N by approximately 1%, and we can conclude that the proposed method is better than the normal method. Thus, we hope that the given algorithm will be utilized in the future.

thumbnail
Table 4. The performance of Algorithm 4.1 and Algorithm N on NFG.

https://doi.org/10.1371/journal.pone.0137166.t004

Conclusion

  1. This paper presented a new conjugate gradient method that exhibits global convergence, linear convergence, and quadratic convergence when suitable assumptions are made. Our proposed CG formula includes not only the gradient value information but also the function value information. For the test benchmark problems, the numerical results showed that the given algorithm is more effective than the normal method without using the restart technique.
  2. We performed tests using benchmark problems using the presented algorithm and the normal algorithm without employing the restart technique. These two methods were shown to be very effective for solving the given problems. Moreover, we did not fix the dimension because n = = 30, and the largest dimension was higher than 100,000 (103,000). Additional numerical problems (such as the problems in [47]) should be investigated in the future to examine this algorithm.
  3. Recently, we solved nonsmooth optimization problems using the relative gradient methods and obtained various results; therefore, in the future, we will use the RMPRP* method to solve nonsmooth optimization problems and hopefully obtain interesting results. Moreover, we will study the convergence of the CG method with other line search rules.

Supporting Information

Acknowledgments

The authors would like to thank the editor and the referees for their useful suggestions and comments, which greatly improved the paper.

Author Contributions

Conceived and designed the experiments: XZ XW XL. Performed the experiments: XZ XW. Analyzed the data: XW XD. Contributed reagents/materials/analysis tools: XL XZ XW. Wrote the paper: XW XZ.

References

  1. 1. Gu B and Sheng VS. Feasibility and finite convergence analysis for accurate on-line v-support vector learning. IEEE Transactions on Neural Networks and Learning Systems. 2013;24:1304–1315.
  2. 2. Li J,Li XL,Yang B,Sun XM. Segmentation-based Image Copy-move Forgery Detection Scheme, IEEE Transactions on Information Forensics and Security. 2015;10: 507–518.
  3. 3. Wen XZ,Shao L, Fang W, and Xue Y. Efficient Feature Selection and Classification for Vehicle Detection. IEEE Transactions on Circuits and Systems for Video Technology. 2015;.
  4. 4. Zhang H, Wu J, Nguyen TM, Sun MX.Synthetic Aperture Radar Image Segmentation by Modified Student’s t-Mixture Model. IEEE Transaction on Geoscience and Remote Sensing. 2014;52: 4391–4403.
  5. 5. Fu ZJ. Achieving Efficient Cloud Search Services: Multi-keyword Ranked Search over Encrypted Cloud Data Supporting Parallel Computing. IEICE Transactions on Communications. E98-B. , 2015: 190–200.
  6. 6. Dai Y, Yuan Y. A nonlinear conjugate gradient with a strong global convergence properties. SIAM J. Optimi. 2000;10: 177–182.
  7. 7. Dai Y, Yuan Y. Nonlinear conjugate gradient Methods. Shanghai Scientific and Technical Publishers, 1998.
  8. 8. Fletcher R. Practical methods of optimization, Vol I: Unconstrained Optimization. 2nd edition, Wiley, New York, 1997.
  9. 9. Fletcher R, Reeves C. Function minimization by conjugate gradients, Comput. J. 1964;7: 149–154.
  10. 10. Hager WW,Zhang H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005;16: 170–192.
  11. 11. Hestenes MR, Stiefel E. Method of conjugate gradient for solving linear equations. J. Res. Nat. Bur. Stand. 1952;49:409–436.
  12. 12. Liu Y, Storey C.Effcient generalized conjugate gradient algorithms part 1: Theory, J. Appl. Math. Comput. 1992;69: 17–41.
  13. 13. Polak E. The conjugate gradient method in extreme problems, Comput. Math. Mathem. Phy. 1969;9: 94–112.
  14. 14. Polak E, Ribière G. Note sur la convergence de directions conjugees. Rev. Fran. Inf. Rech. Opérat. 1969;3: 35–43.
  15. 15. Wei Z, Yao S, Liu L. The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 2006;183: 1341–1350.
  16. 16. Yuan GL, Lu XW. A modified PRP conjugate gradient method. Anna. Operat. Res. 2009;166: 73–90.
  17. 17. Yuan GL,. Lu XW, Wei ZX. A conjugate gradient method with descent direction for unconstrained optimization. J. Comput. Appl. Math. 2009;233: 519–530.
  18. 18. Dai Y. Analysis of conjugate gradient methods, Ph.D. Thesis, Institute of Computational Mathematics and Scientific/Engineering Computing. Chese Academy of Sciences, 1997.
  19. 19. Dai ZF, Tian BS. Global convergence of some modified PRP nonlinear conjugate gradient methods. Optim. Let.
  20. 20. Gilbert J C, Nocedal J. Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 1992;2: 21–42.
  21. 21. Hager WW, Zhang H. Algorithm 851: CG DESCENT, A conjugate gradient method with guaranteed descent. ACM Trans. Mathem. Soft. 2006;32: 113–137.
  22. 22. Powell MJD. Nonconvex minimization calculations and the conjugate gradient method. Lecture Notes in Mathematics, Vol. 1066, Spinger-Verlag, Berlin, 1984, pp. 122–141.
  23. 23. Powell MJD. Convergence properties of algorithm for nonlinear optimization. SIAM Rev. 1986;28: 487–500.
  24. 24. Yu GH. Nonlinear self-scaling conjugate gradient methods for large-scale optimization problems. thesis of Doctor’s Degree, Sun Yat-Sen University, 2007.
  25. 25. Yuan GL. Modified nonlinear conjugate gradient methods with sufficient descent property for large-scale optimization problems, Optim. Let. 2009;3: 11–21.
  26. 26. Yuan YX. Analysis on the conjugate gradient method. Optim. Meth. Soft. 1993;2: 19–29.
  27. 27. Yuan GL, Wei ZX and Li GY. A modified Polak-Ribiére-Polyak conjugate gradient algorithm for nonsmooth convex programs. Journal of Computational and Applied Mathematics. 2014;255: 86–96.
  28. 28. Yuan GL, Wei ZX, and Zhao QM. A modified Polak-Ribiére-Polyak conjugate gradient algorithm for large-scale optimization problems. IIE Transactions, 2014;46: 397–413.
  29. 29. Yuan GL, Zhang M J. A modified Hestenes-Stiefel conjugate gradient algorithm for large-scale optimization. Numerical Functional Analysis and Optimization. 2013;34:914–937.
  30. 30. Burmeister W. Die Konvergenzordnung des Fletcher-Powell Algorithmus, Z. Angew. Math. Mech. 1973;53: 693–699.
  31. 31. Cohen A. Rate of convergence of several conjugate gradient algorithms. SIAM J. Numer. Anal. 1972;9: 248–259.
  32. 32. Ritter K. On the rate of superlinear convergence of a class of variable metric methods, Numer. Math. 1980;35: 293–313.
  33. 33. Zhang L, Zhou W, Li DH. A descent modified Polak-Ribiére-Polyak conjugate gradient method and its global convergence, IMA J. Numer. Anal. 2006;26: 629–640.
  34. 34. Li DH, Tian BS. N-step quadratic convergence of the MPRP method with a restart strategy, J. comput. Appl. Math. 2011;235: 4978–4990.
  35. 35. Broyden CG, Dennis JE, Moré JJ. On the local and superlinear convergence of quasi-Newton methods, J. Ins. Math. Appl., 12 (1973), pp. 223–246.
  36. 36. Byrd R, Nocedal J. A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Numer. Anal. 1989;26: 727–739.
  37. 37. Byrd R, Nocedal J, Yuan Y. Global convergence of a class of quasi-Newton methods on convex problems. SIAM J. Numer. Anal. 1987;24: 1171–1189.
  38. 38. Dai Y. Convergence properties of the BFGS algorithm. SIAM J. Optim. 2003:13 693–701.
  39. 39. Dennis JE, Moré JJ. A characterization of superlinear convergence and its application to quasi-Newton methods. Math. Comput. 1974;28:549–560.
  40. 40. Li D, Fukushima M A modified BFGS method and its global convergence in nonconvex minimization. J. Comput. Appl. Math. 2001;129: 15–35.
  41. 41. Li D, Fukushima M. On the global convergence of the BFGS method for nonconvex unconstrained optimization problems. SIAM J. Optim. 2001;11: 1054–1064.
  42. 42. Liu GH, Han JY, Sun DF. Global convergence Analysis of the BFGS Algorithm with Nonmonotone linesearch. Optim. 1995;34: 147–159.
  43. 43. Wei Z, Yu G, Yuan G, Lian Z. The superlinear convergence of a modified BFGS-type method for unconstrained optimization. Comput. Optim. Appli. 2004;29: 315–332.
  44. 44. Wei Z, Li G, Qi L. New quasi-Newton methods for unconstrained optimization problems. Appl. Math. Comput. 2006;175: 1156–1188.
  45. 45. Yuan GL, Wei ZX. Convergence analysis of a modified BFGS method on convex minimizations. Comput. Optim. Appli. 2010;47: 237–255.
  46. 46. Yuan GL, Lu XW. A modified PRP conjugate gradient method. Annals of Operations Research. 2009;166: 73–90.
  47. 47. Nicholas I, Gould M, Dominique Orban and Philippe L. Toint, CUTEr(and SifDec): A Constrained and Unconstrained Testing Environment, revisited. ACM Trans. Mathem. Soft. 2003; 29: 373–394.