For better experience, JavaScript is recommended for this website. Enable JavaScript in your browser
Courtesy of

Data-Driven Discovery

Data Science Environments

In November 2013, we announced a bold new partnership to harness the potential of data scientists and big data for basic research and scientific discovery. With our partners, we launched three Data Science Environments at New York University, the University of California, Berkeley and theUniversity of Washington with ​funding from the Gordon and Betty Moore Foundation and Alfred P. Sloan Foundation. ​This is ​a five-year, $37.8 million cross-institutional effort to bring data science to the forefront of cross-disciplinary academic research.

The Data Science Environments are working to bring about institutional change via campus-wide experimentation to catalyze a new era of research: cross-disciplinary efforts working towards new approaches to data-intensive discovery. At a time when the life, physical, mathematical, and computational sciences are all producing data with relentlessly increasing volume, variety and velocity, capturing the full potential of a progressively data-rich world has become a daunting hurdle for researchers. At the intersection of natural science, computation and mathematics, data science is already contributing to scientific discovery, yet substantial systemic challenges need to be overcome to maximize its impact on academic research. This ambitious partnership will spur collaborations within and across the three campuses and with other partners pursuing similar data-intensive science goals.



This project seeks to achieve three core goals:

  • Develop meaningful and sustained interactions and collaborations between researchers with backgrounds in specific subjects (such as astrophysics, genetics, economics), and in the methodology fields (such as computer science, statistics and applied mathematics), with the specific aim of recognizing what it takes to move each of the sciences forward;
  • Establish career paths that are long-term and sustainable, using alternative metrics and reward structures to retain a new generation of scientists whose research focuses on the multi-disciplinary analysis of massive, noisy, and complex scientific data and the development of the tools and techniques that enable this analysis; and
  • Build on current academic and industrial efforts to work towards an ecosystem of analytical tools and research practices that is sustainable, reusable, extensible, easy to translate across research areas, and enables researchers to spend more time focusing on their science.

These partner universities have pioneered new approaches to discovery in fields as diverse as astronomy, biology, oceanography, and sociology through deep collaborations between researchers in these fields and researchers in data science methodology fields such as computer science, statistics and applied mathematics. This new collaboration – a coordinated, distributed experiment involving researchers at these leading universities – will work with other leaders to develop effective models that dramatically accelerate this data science revolution.


Cross-university teams organize their efforts around six focal areas:

  • strengthening an ecosystem of tools and software environments,
  • establishing academic careers for data scientists,
  • championing education and training in data science at all levels,
  • promoting and facilitating accessible and reproducible research,
  • creating physical and intellectual ​spaces for data science activities, and
  • identifying the scientists’ data-science bottlenecks and needs through directed ethnography.