optimization routine for function that is not smooth and also has a lot of local minimum
Mostrar comentarios más antiguos
Hi, is anyone can give me some suggestion about optimization routine for function that is not smooth and also has a lot of local minimum?
1 comentario
Respuestas (5)
Alan Weiss
el 19 de Feb. de 2013
If you want to minimize a nonsmooth function, the general recommendation is to try patternsearch. If there are many local minima, the general recommendation is to use a variety of start points. If you have finite bounds on all components, lb and ub, then you can obtain random start points as follows:
x0 = lb + rand(size(lb)).*(ub - lb);
Good luck,
Alan Weiss
MATLAB mathematical toolbox documentation
4 comentarios
xueqi
el 20 de Feb. de 2013
Alan Weiss
el 20 de Feb. de 2013
It seems that your objective function returns its value whenever it is evaluated. The reason there are so many function evaluations between each line of the iterative display is that fmincon is taking small perturbations of the curent point and evaluating the objective at these new points. You see there are many times where the small perturbation in the objective leads to an Inf or a big departure from the current function value.
I conclude that your objective function is not finite at points very close to the current point after the first iteration.
Can you put some constraints on your problem so that the objective doesn't get so large? Or can you check to ensure that the objective function is defined correctly?
And one more thing. I see that you have tried to use GA to optimize your function. I strongly recommend that you use patternsearch instead--it is much faster and more reliable.
Good luck,
Alan Weiss
MATLAB mathematical toolbox documentation
xueqi
el 20 de Feb. de 2013
xueqi
el 1 de Mzo. de 2013
Thorsten
el 18 de Feb. de 2013
http://en.wikipedia.org/wiki/Simulated_annealing
Mohsen Davarynejad
el 18 de Feb. de 2013
0 votos
All the algorithms suitable for black-box optimization can be used. These algorithms, including Genetic algorithms (GA) and Particle swarm optimization (PSO), do not make any assumptions regarding the problem at hand, and thus they neither require the function to be convex, nor do they require the availability of the gradient of the function.
5 comentarios
xueqi
el 18 de Feb. de 2013
Mohsen Davarynejad
el 18 de Feb. de 2013
The combination is useful when you want to improve the exploitation of GA (when you are close to a good solution and you want to get as closer as possible to that possibly local optimal solution). In your case, it looks like you do not have enough exploration, thus the combination is of no use. You can improve the exploration of GA by increasing for instance the mutation rate.
Mohsen Davarynejad
el 18 de Feb. de 2013
Share some information about that problem, the dimensions, and the way you code the problem in GA. That may helps to give you batter suggestions.
xueqi
el 20 de Feb. de 2013
xueqi
el 20 de Feb. de 2013
Juan Camilo Medina
el 2 de Mzo. de 2013
Editada: Juan Camilo Medina
el 2 de Mzo. de 2013
Based on my experience, the best way is to use simulated annealing, which is less heuristic than GA and it's powered by a Markov chain. fmincon() or any other gradient-based algorithm won't work well since the function is non-smooth.
[x,fval,exitFlag,output] = simulannealbnd(ObjectiveFunction,X0)
you can also try
x = fminsearch(fun,x0)
1 comentario
xueqi
el 3 de Mzo. de 2013
I mentioned this to you in another post, but I'll document it here as well. I think the best way to deal with an ill-behaved objective function is to replace it with a better one. I.e., question your initial reasoning behind this function and see if there are alternative formulations that you haven't considered.
If nothing else, it is suspicious that you think you absolutely must accept the non-smoothness of the function. If you've thought about the problem formulation carefully enough, there are almost always smooth approximations that you can find. The local minima are another matter.
If you can't see an alternative yourself, it may be time to walk us through the development of the function you're thinking of so we can see where you might have taken a wrong turn.
41 comentarios
xueqi
el 3 de Mzo. de 2013
Matt J
el 3 de Mzo. de 2013
I think you need to write down the mathematical formula for the objective function (and constraints if any), so we can see what we're talking about.
xueqi
el 3 de Mzo. de 2013
xueqi
el 3 de Mzo. de 2013
Matt J
el 3 de Mzo. de 2013
I'm still waiting for the formula for the objective function, as opposed to the code. Regardless, you haven't formatted the for-loop, so your objective function code is hard to read.
In any case, I have 2 remarks
- It looks like you are minimizing a loglikelihood when (if maximum likelihood estimation is the goal) you should be minimizing the negative of the loglikelihood.
- The Infs are probably coming from taking the log of the likelihood in regions of low probability. One way to eliminate them would be to minimize minus the likelihood, instead of minus the loglikelihood.
xueqi
el 3 de Mzo. de 2013
I didn't think you were still using FMINCON. Everyone's been telling you to use simulated annealing and pattern search.
In any case, you should be constraining hatmu2>= -0.5, because
- That's the region where probabilities are >0 and so where the solution lies.
- When working with loglikelihoods, instead of likelihoods, that's where the objective function is defined.
Matt J
el 3 de Mzo. de 2013
BTW you've shown llh1 really commented out
% llh1=betacdf(hatmu1+0.5)-betacdf(hatmu1-0.5);
So if this is not the formula for llh1 that you're using, what formula are you using instead?
Keep up with my posts! I told you that you need to be using constraints that keep the algorithm away from this region.
Incidentally, is there any reason why you're using a finite difference of betacdf
llh=betacdf(hatmu+0.5)-betacdf(hatmu-0.5);
to approximate the beta distribution, when the density function is given exactly and trivially by
hatmu.^(a-1)*(1-hatmu)^(b-1)/beta(a,b)
Reprinting your function with more readable formatting
for i=1:rn
alpha1(i)= (b1*edw(i)/2+(1-b1)*mu(i,1))*(s1-1)/edw(i);
beta1(i)= (edw(i)-b1*edw(i)/2-(1-b1)*mu(i,1))*(s1-1)/edw(i);
alpha2(i)= (b2*edw(i)/2+(1-b2)*mu(i,2))*(s2-1)/edw(i);
beta2(i)= (edw(i)-b2*edw(i)/2-(1-b2)*mu(i,2))*(s2-1)/edw(i);
llh1(i,:)=betacdf((hatmu(i,1)+0.5)/edw(i,:),alpha1(i),beta1(i))-... betacdf((hatmu(i,1)-0.5)/edw(i,:),alpha1(i),beta1(i));
llh2(i,:)=...
betacdf((hatmu(i,2)+0.5)/edw(i,:),alpha2(i),beta2(i))-
betacdf((hatmu(i,2)-0.5)/edw(i,:),alpha2(i),beta2(i));
end
Now that I can read your function, I can make more complete recommendations.
First, I think you should go back to loglikelihoods.
Second, you also have to make sure that you have no hatmu that would cause betacdf to be zero. Remember, the beta distribution only has non-zero probability for data values between 0 and 1.
Third, it appears that the dependence on your unknown parameters are entirely through alpha1(i), alpha2(i), beta1(i), beta2(i) and this dependence is linear. You need to ensure that the alphas and betas are constrained to be non-negative, which is the assumption of the beta distribution. Since their dependence on the unknowns is linear, you can do this by specifying linear constraints.
hatmu2 is >=0.5 in my data set. Both hatmu1 and hatmu2 >=0 and <= edw(a specific number). Is this what you meaning here?
But that means that hatmu can be equal to or close to edw. When this happens
(hatmu+0.5)/edw >1
and the beta distribution is zero at such values for all parameter choices.
What is the purpose of the 0.5? Are you trying to compute finite difference derivatives of betacdf? As I said before, this is unnecessary and also might be creating problems.
Patternsearch seems a a little bit slow. Is there anyway to speed it up?
Forget patternsearch. Now that I can read your code and understand your problem, I think that FMINCON should be fine, once you apply the constraints that I've described.
xueqi
el 3 de Mzo. de 2013
Matt J
el 3 de Mzo. de 2013
OK, well then it seems like your Inf values are more due to letting alpha(i) and beta(i) run too close to zero (or less). Applying constraints to alpha(i) and beta(i) and maybe also using sqp, as I mentioned before, will probably help.
xueqi
el 5 de Mzo. de 2013
It doesn't sound like a precision problem. If you have a loglikelihood term that is equal to 0, it means that you are operating in a flat region of BETACDF, which means one of the following is true
- That (hatmu+0.5)/edw <=0 or equivalently that hatmu<=-0.5,
- That (hatmu-0.5)/edw >= 1 or equivalently that hatmu>=edw+.5
You must ensure that all the hatmu lie in the "legal" open interval (-.5, edw+.5). Otherwise, it's bad data, plain and simple.
xueqi
el 5 de Mzo. de 2013
Matt J
el 5 de Mzo. de 2013
The hatmu numbers you've shown clearly satisfy the lower bound of -.5, but you haven't shown edw, so we don't know about the upper bound of edw+.5
xueqi
el 5 de Mzo. de 2013
But there is nothing abnormal for alpha1(18) beta1(18) alpha2(18) beta2(18)
They look pretty abnormal to me. With alpha and beta on the order of 1000, it means you're working with a beta distribution that is a polynomial of degree 1000+. That's a pretty ill-behaved thing. As I said before, I think you need to be constraining your alphas and betas in some way
Also, I now see that your earlier claim, that the objective function is smooth, is not true.
mu = maxmin(sub,DD)
is not a differentiable function of x, assuming
maxmin(v)=[max(v), min(v)]
xueqi
el 5 de Mzo. de 2013
Matt J
el 5 de Mzo. de 2013
Here maxmin is a just a function name given by me.
What does it do?
xueqi
el 5 de Mzo. de 2013
Matt J
el 5 de Mzo. de 2013
it seems quite smooth around the optimal.
It needs to be smooth throughout the feasible region, not just at the optimum. Also, determining smoothness by inspection of plots can be tricky. For example, discontinuities can be hidden by undersampling. Also, FMINCON algorithms generally require functions to be twice continuously differentiable. Smoothness of 1st and higher derivatives are generally invisible graphically.
xueqi
el 7 de Mzo. de 2013
Matt J
el 7 de Mzo. de 2013
Are you saying everything's working now?
Matt J
el 9 de Mzo. de 2013
No, the only thing that would help is to see maxmin itself. Also, how did you end up getting rid of the Infs?
Do you suggest if maxmin is smooth then the likelihood function will also be smooth
Yes. All other operations in your likelihood function look smooth. maxmin is the only question mark.
just assign a very small number(1e-10)if one of the likelihood is equal to 0
I doubt that's really solving anything. Sure, it replaces the infs with some large number, but it won't change the fact that the likelihood will be locally flat to within numerical precision for certain alpha, beta. I suspect that's why you're still getting stuck so often.
One idea would be, for large alpha, beta to write the loglikelihood in terms of Stirling's approximation of Beta(alpha,beta),
llh(x,alpha, beta)= alpha*log(x)+beta*log(1-x) + 0.5*log(2*pi)...
(alpha+beta-.5)*log(alpha+beta) - ...
(alpha-.5)*log(alpha) - ...
(beta-.5)*log(beta) - ...
xueqi
el 12 de Mzo. de 2013
Categorías
Más información sobre Student's t Distribution en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!