Ecosystems are often studied by writing mathematical equations to describe how different parts of them interact with each other. For instance, scientists might want to study how fast herds of caribou eat grass in the Arctic tundra, or how much oxygen a field of corn created in the last month. Once the equations are written, the scientist can then turn to computers to run programs that solve the math in these equations so rapidly that they can predict answers to specific questions that may be in terms of weeks, years or even centuries in the future. The problem is that biology, unlike physics, does not have a simple set of equations that can be used repeatedly for different studies. As a result, there are many different ecosystem models, making it hard to interpret or predict with any reasonable amount of certainty. This project makes thousands of different ecosystem models, each with a randomly created set of equations. The models are then allowed to reproduce both asexually and sexually. In addition, some of the equations in the models are allowed to mutate. When asexual reproduction occurs, that means the model is allowed to continue into the next generation. But, when sexual reproduction occurs, some powerful things happen. First, two models are chosen and called 'males'. These two models are allowed to 'compete' by comparing how well each one can successfully reproduce a set of observations that the models are trying to simulate. The best is selected, and the competition is repeated with a set of chosen 'female' models. The resulting two models are then allowed to recombine their model equations randomly to create an entirely new model that is allowed to go into the next generation. Over successive generations, the models get better until a reasonable solution is reached, and a computer-generated set of model equations results.
Present day modeling efforts to resolve upper ocean biogeochemical processes use coupled sets of ordinary or partial differential equations. These sets of equations or models, which at the basic level represent both ecosystem function and diversity, are subjectively developed, more or less independently, using in situ observations and conclusions derived from scientific literature. In the past 15 years, data assimilation techniques have become commonly used as a means to optimize the set of free parameters in these models in order to improve the model solutions. Because the systems of equations are themselves likely not optimized to represent the actual ocean system under study, it has been argued that data assimilation techniques targeted at parameter optimization can only partly improve the model solutions. There are basically no objective methods for improving model equations other than changing the values of the parameter set. This project seeks to implement a programming technique called “Genetic Programming” (GP) to optimize not only the set of free parameters within the model but also the coupled set of model equations. The project has initially focused on simple solutions using previously developed simple ecosystem models for carrying out “twin experiments.” The effort will progress to more complicated solutions by broadcasting the GP code into a massively parallel machine that would allow each processor or set of processors to be referenced to specific geographic locations and forced by local conditions. This “Island Approach” will focus on creating unique sets of models solutions and equation/parameter sets for various ocean regions. Results from the global solution of models will be analyzed to demonstrate the varying levels of ecosystem complexity (a direct analog of ecosystem diversity) observed within various ocean biogeochemical provinces.
This is a movie that shows how a population of ecosystem model configuration can use Genetic Programming techniques to optimize the model's set of equations or tree structure. What is shown is a histogram of the normalized Sum of Square Errors resulting from each optimized model configuration. The first generation shows that there is an initial population of models with a reasonable gaussian shaped distribution of SSE values. The SSE values are determined using a 'twin experiment' where we are using a known model configuration to generate a synthetic data set to which we are trying to evolve towards. For each successive generation it appears that the entire population behaves similar to a wave propagating toward lower and lower SSE values. This continues for some time until a solution is obtained (shown by a green line at the SSE value of 0.0 to note how many of the models have obtained to correct equation set). What we have noted is that the optimization occurs best when diversity is maintained within the population. Also, the optimization is very rapid and robust over repeated testing. For this case a solution popped up after only ~180 generations.
This movie is essentially a bitmap image of Movie #1 above such that the histogram values are now plotted as color for each generation (x-axis) and SSE value (y-axes). These results were obtained from a simulation aimed at obtaining the model equations for a more complex ecosystem model (Franks et al. 1986) so the number of generation required to be completed before the correct model solution set was obtained (denoted by the red vertical line) is greater than the lower ~180 generations required from the first movie/test. The different images within this movie sequence are obtained from rerunning the model code repeatedly in order to validate that the application is robust, which it is. For this model, about ~500 generations were necessary for the code to converge on the correct model equation set. But, as in the first movie, the initial SSE population was Gaussian and quickly moved as a population to lower SSE values, becoming one-tailed in all cases. The striping in the colors occurs because of the granularity of the model function choices (+, -, /, *, etc) causes errors SSE field to become grouped into specific zones.
Return to the main Marine Microbiology Initiative page