Wednesday, April 13, 2011

Eureqa and P vs NP .

Eureqa and P vs NP .
Andre Willers
13 Apr 2011

Synopsis :
Eureqa (pronounced "eureka") is a software tool for detecting equations and hidden mathematical relationships in your data. Its goal is to identify the simplest mathematical formulas which could describe the underlying mechanisms that produced the data. Eureqa is free to download and use. Below you will find the program download, video tutorial, user forum, and other and reference materials.
· Read about Eureqa at , in the article "Download Your Own Robot Scientist" by Brandon Keim
· Read about Eureqa at , in the article "Eureqa, the Robot Scientist" by Lin Edwards
· Read about Eureqa at , in the article "Eureqa - Software to Replace Scientists" by Aaron Saenz
· Read about Eureqa at , in the article "Move over, Einstein: Machines will take it from here" by Justin Mullins
If you publish work based on results generated by this program, please cite Schmidt M., Lipson H. (2009) "Distilling Free-Form Natural Laws from Experimental Data," Science, Vol. 324, no. 5923, pp. 81 - 85.
Downloads from http://ccsl.mae.cornell.edu/eureqa

Discussion :
An algorithm that can find the "shortcuts" in P-time that we know must exist .
See http://andreswhy.blogspot.com "P vs NP" 15 Feb 2011 .

I only read about today (Apr 13 , 2011 19h30 SA time) and it blew my mind away .
All credit to M .Schmidt and H . Lipson .

The implications are staggering .
A finite algorithm that can find general principles applicable to infinite data-sets from a finite data-set .
Expected from "P vs NP" , but still impressive .

And it is free Download it and play around ..
It restores my somewhat battered faith in human nature .

Paradigm Shift .
It is acting as the seed-kernel of a crystallizing new way of doing Science .
Not just new theories , but ways of formulating theories .
Good , real scientists will be in greater demand than ever as the flood of ways to apply it start pouring in .
Interpretation will be a big problem .

How it works :
General Information about Genetic Programming and Symbolic Regression
Read about Symbolic Regression on Wikipedia
Read an example Symbolic Regression problem by John Koza
Read an overview of Symbolic Regression by Zelinka Ivan

Analytic programming: www.ft.utb.cz/people/zelinka/soma .
It is two generations on from the Genetic Algorithms I worked on in the day

Why does it work ?
A more important question .

The Trick :
1.Chopping (form schemata ie similarity templates)
2.Sorting . (Sort schemata per similarity) Very fast .
3.Symbolic regression (includes Genetic Programming)
This gives a search of the Genetic spaces not larger than the third order ie Order(n^3)

Proofs :
See David E Goldberg "Genetic Algorithms" ISBN 0-201-15767-5 (1989)

1. P19: "A schema is a similarity template describing a subset of strings with similarities at certain string positions " That's chopping up the problem into delineable pieces .

2. Sorting them (the order does not matter , as long as there is an order) puts similar schemata "next"to each other . This speeds up the Symbolic Regression .

3. P40 : "How many schemata are processed usefully ?"
The proof shows that "the number of schemata is proportional to the cube of the population size "

A NP problem is suddenly reduced to P size of cubic order of elements .( O(n^3) )
The alert reader will notice the 1/3 Reserve argument for infinite systems , despite it's heavy disguise .

Nice . But what does it mean ?
The Genetic algorithm reduces Hard Problems to soluble problems .
It only needs to search the similar schemata once per element .This drastically reduces searches to (n*(n-1)*(n-2) ) ~ n^3 for infinite search-spaces for unique items .
See http://andreswhy.blogspot.com : "New Tools:Infinite Probes" Nov 2008

It searches the prepared data-spaces in polynomial time of order O(3) .
Which is why evolutionary systems use it .

Travelling Salesman Problem :
This is now a snap .
After sorting distances , make distances unique by random (+ , 0 , -) per city , giving
n^3 . And there you are .

I mention this because most of the Symbolic Regression arguments are our old friend
the Travelling salesman in disguise . The iteration means that every time he gets back
to the starting point , he gets a new route . This has to be recalculated .

Singularity Significance :
This is the last , irrevocable step .
Humans will have to change their nature , or become irrelevant .

Once again , my compliments to M .Schmidt and H . Lipson .

I wish I had done it .

Andre .

No comments: