Katherine Casey and collaborators study development in areas where the state is weak and traditional authorities control public goods.
By Veronica Marian
Policymakers across the world face a consistent hurdle when attempting to design anti-poverty programs: lack of information. While we know that roughly 700 million people live in extreme poverty, we are missing vital data that could better inform impactful policies. Details like rainfall levels, harvest yields, and household income are expensive and time consuming to collect with traditional methods. These typically involve sending surveyors to do door to door or village to village data collection.
However, recent innovations in satellite imagery, cloud storage, and the overarching category of machine learning are making data gathering easier than ever.
With machine learning, computer programs can use data to make reasonably accurate predictions, cutting out the cost and time required by physical surveying.
The new Stanford course Data for Sustainable Development introduces Stanford students to these new methods. In the class, first offered in Fall quarter 2017, a mix of graduate and undergraduate students developed machine learning applications that use satellite imagery or geospatial data to measure outcomes relevant to sustainable development issues like poverty, health, or governance.
Beyond teaching students these methods, the class was structured so that students worked on projects with real-world applicability in sustainable development. In the course’s first iteration, final projects used satellite imagery to predict poverty in Bangladesh and India, to forecast soy, corn, and wheat crop yields in Argentina, and to create improved cropland classification models.
Applying technical skills to sustainability problems
The class is taught collaboratively by earth system science professors David Lobell and Marshall Burke, and computer science Professor Stefano Ermon. Students also benefit from the involvement of economics Professor Pascaline Dupas and political science Professor Jeremy Weinstein who pose research questions and challenges in order to have the students think about the real-world applicability of their projects.
“Each project sets a goal that uses an inexpensive dataset, for example satellite images, to predict something that traditionally is quite expensive to measure, like the poverty level of a village,” explained Lobell, the deputy director of the Center on Food Security and the Environment and a faculty affiliate of the Stanford Center on Global Poverty and Development where he leads the Data for Development Initiative.
If students can use the data to predict that one village is in need of more aid than another, this could help inform the way aid gets distributed. Ultimately, “the goal is to develop a good prediction model and write it up in a publishable form,” he added.
The faculty encourage the students to view their projects as real research that uses nonconventional data approaches to possibly influence policy. Faculty members provided in-depth feedback each week as students worked on their quarter-long projects.
“This tight feedback loop let us catch mistakes early and gave us useful context on the data,” said Tony Duan, a first-year graduate student in computer science who took the class in the Fall. “In contrast, many other classes with final project components only provide feedback a few times per quarter,” Duan added.
“I deeply care about sustainable development, especially addressing issues related to the environment. The class was a great opportunity to practice using computer science skills for this purpose,” said Caelin Tran, a graduate student in the computer science department who worked on a final project predicting crop yields in Argentina using remote sensing data.
His group’s project results showed that satellite imagery is a much cheaper and effective alternative to existing techniques of predicting crop yields in developing countries. The current methods include locally sensed data like soil samples and rainfall measurements, which are difficult to scale and very expensive to collect.
“It's impossible to not make something ‘real’ in this class, and it was a pleasure to have designed, built, and tested something with genuine potential,” Tran said.
Haque Ishfaq, a Master’s student in statistics, has taken many machine learning and data science related classes. He has also interned at various Silicon Valley companies. While “it was cool to think about machine learning problems in the context of autonomous driving, optimized ad targeting, or virtual reality,” he was missing a sense of purpose in the work he was doing. “This class was offering projects that address real problems from the parts of the world that are in dire need of our attention but are often neglected. I was eager to see how I could use my machine learning and data science background to tackle these problems and this class was a perfect opportunity for that,” he said.
For Ishfaq, his group’s final project, which used cheap open source satellite imagery to map poverty, hit close to home.
“The best part was working on a project that involves my own country, Bangladesh,” he said. “My team was working on understanding poverty in India and Bangladesh and how we can infer this by looking at satellite images. It was exciting to see our models being able to do a good job in predicting local poverty levels and validating my own prior understanding about certain areas in Bangladesh. At the same time, my exposure to Bangladesh's socio-economic structure allowed me to pinpoint if there was any anomaly or counterintuitive result in our model. I think this whole experience was intellectually very stimulating.”
The project was successful in creating a machine learning program that outperformed existing models that predict economic consumption. These results point to the very real applicability of these practices, which can allow policy makers to improve the ways in which they design poverty alleviation programs.
Part of a bigger picture
The class originated as part of the Data for Development Initiative at the Center on Global Poverty and Development. All the faculty members teaching or otherwise involved with the course are attached to the Initiative, which aims to explore the relevance of often low-cost, unconventional data streams like satellite imagery or call data records in the fight against global poverty.
Data analysts working with the Initiative have spent long hours developing data sets that eventually ended up being used for the class, and Stanford professors affiliated with the Initiative weigh in on ideas for final projects. In this way, the class is designed to truly give students the experience of working with world-class researchers on finding solutions to pressing global sustainability problems.