Forum – Deep Genetic Learning

Background

This topic contains 0 replies, has 1 voice, and was last updated by Anders Modén 6 years, 7 months ago.

Author

Posts
September 26, 2017 at 4:06 pm #24812

Anders Modén
Participant

There is no doubt that AI is a revolutionary technique to solve numerous of difficult problems that used to be “impossible” a couple of years ago. The technique has evolved and today its pretty common knowledge to solve a forward feed network using back propagation.

The evolution has been driven by some new inventions in network topology like the various CNN topologies and constructions like LSTM and new training concepts like reinforcement learning. Another major factor is of course the evolution of HW that has been a fundamental contributor to the fast evolution of AI techniques.

However you must ask the question if the evolution of AI is driven in a specific direction driven by the HW and not the other way. The current way to solve deep NN problems is by using back propagation and that is mainly implemented in tensor matrix algebra and in forward feed constructions only. These specific network topologies are suitable to accelerate using parallel matrix math and you can create very deep networks that can be trained in hours instead of weeks.

But what about recursive constructions? Of course you can expand a recursive construction into a deep network but that will always give you a fixed set of iterations. What if deep networks are totally waste of space and just a special case of true recursive iterations. It is easily understood that a recursive network could represent an infinite deep network with infinite layers?

So is there a problem with today’s state of the art networks and HW that we only evolve in one direction or at least much faster in one direction?

My answer is yes. The problem is mainly because our inability to train and to implement acceleration in fully recursive constructions. The problem boils down to the back propagation method actually.

If we take a moment of contemplation we can see that there are a number of problems with the actual methodology to do training. It’s not only the back propagation but there are a bunch of other problems related to training as well.

How can we create non supervised training? Self learning?
How can we parallelize training beyond iterative closed contexts?
How can we speed up training?
How can we improve reinforcement learning?
Is there a natural random process that will evolve AI like we have been evolved by nature?
How can we create methods to find “optimal topologies” ?
Many of these questions can be answered with Genetic Evolution using Genetic Algorithms. At least this is my opinion.

This is a quite strong statement and the explanation for all these tasks is too large to fit in this margin 🙂 but I will start with one example.

Training Speed

Let’s start a bit close to today’s technology. Let’s start with a traditional FF deep NN. The first problem that you need to be aware of is the input and data. As we often use a gradient search method to iterate our solution we need to make sure that we have a good start and good data so the iterations will move from a well-defined starting point down in a smooth passage to the best solution. The NN is actually a large multi-dimensional nonlinear equation. You can think of a large rocky landscape. The only thing you know about this landscape is that the best solution is located at the lowest point. You can see some low points where you are right now but you cannot know what’s behind the mountains. The only “method” you have is to go around the mountains following a descent to the steepest way and see if this will lead you around the mountain into the valley and then hopefully you can continue even further into that valley etc. When you reach the lowest point in your surrounding you have reached your best solution. You immediately understand that this might be just a local minima and not the best global minima and this problem is typically the same in deep learning. As long as you have a smooth landscape the method will work good but for really nonlinear situations with lots of multi-dimensional saddle points you will end up in a lot of walking up and down hills to find the passage to the best lowest spot.

But what if we could peek over the mountain? If we could just pick a random spot somewhere and look at the height there?

The problem with that approach is that there are infinite random spots and if we just use random points we would spend ages and test to find better spots without any knowledge. We could be lucky of course but we would not use any strategy to find better spots.

However nature invented such a strategy. It’s the base of all our evolution and we can call it genetic evolution. A genetic algorithm uses random solutions as one part in finding new better solutions but is has a more evolved strategy. Genetic algorithms are very good at finding new solutions near old solutions so the probability to find a solution near an old solution is much higher than just random search. The mechanisms like crossover and mutations and a mechanism I have invented which I call breeding defines a very specific strategy. Let’s see why…

It will be able to find solutions anywhere in the search space as the random search pattern is based on a normal distribution.
It will be able to find solutions around already good search places as there is a concentration in the probability around “known solutions”
It will be able to find solutions between good search places that will handle saddle points
Now a genetic solution also has a beautiful property and that is it has a non-closed iterative context. In a gradient descent you will need to evaluate a number of solutions and the “collect” the results to do the gradient walk. There is only one walk. That means that you always need to collect the data in a single point even if you have processed the NN forward evaluations in parallel for multiple choises. A genetic solver doesn’t need that. It can cut loose a number of “individuals” (equation solutions) and let them continue to evolve on another computer and on another computer etc. etc. which gives a fully distributed mechanism to find better solutions. It is only when a new solution is found that it can be distributed with broadcast to the other computers to update their performance in search.

You get it ? A fully distributed mechanism that will solve issues that the normal back propagation suffers from. But is there a downside?

Yes. The genetic solver is good at finding solutions that cannot be reached or iterated by the back propagation method but its not as efficient as fine tune the solution that we are standing at. The genetic algorithm continues to look at distant positions in the equation landscape even if we are close to the very best point. The gradient method would just walk down the hill we are standing at and find the best solution.

Collaboration is great! What if we could use both worlds of solutions? To start our walk with descent. Take a pause and see if we had any other good spots to jump to and let the genetic solver find such spots distributed. And then jump to the new spot if the GAs found one and iterate that spot using descent.

My research shows that saddle points and local minimas as well as good starting points are easily found using a state machine that jumps between GAs and BackProp. The power of GAs is really interesting in AI technologies.

There are of course other exciting results as well as GAs will allow recursive networks to be trained. It will allow a random evolution process without supervision. It will allow network topology evolution as part of training so there is a bunch of exciting areas for future research.

My aim with this article was to give my point of view and see if we could balance the technology evolution in AI a bit. You can find me on anders@tooltech-software.com
Author

Posts

You must be logged in to reply to this topic.

Group Admins

Background

Login

Lost Password