Free Doses of Data Science
Mobilize Center MOOCs by Professors Who Wrote the Book
Want to dip a toe in data science? Why not take a MOOC (massively open online course) from someone who literally wrote the book on the topic at hand?
Several MOOCs offered by Stanford professors who are part of the Mobilize Center fit the bill. Trevor Hastie, PhD, professor of statistics, co-wrote Introduction to Statistical Learning; Jure Leskovec, PhD, assistant professor of computer science, co-wrote Mining Massive Datasets; and Stephen Boyd, PhD, professor of electrical engineering, co-wrote Convex Optimization. And each of them teaches a MOOC by the same name.
In Hastie’s case, the book inspired the MOOC. “We had a book that was at the right level for a MOOC so we decided we’d do it.” He and Robert Tibshirani, PhD, co-author of the book and co-teacher of the MOOC, also made a deal with the publisher: The book became free online just six months after publication. It’s an extra draw for students—not only is the course free, but the text is as well. The same is true for the Mining Massive Datasets MOOC.
The statistical learning MOOC, offered on Stanford’s OpenEdX platform, has proven popular with people looking to broaden their horizons. “They get a free dose of what the field is like, especially now that data science is so popular,” Hastie says. “And they can decide whether to make a career move.”
Hastie’s MOOC follows the structure of the Introduction to Statistical Learning text. Typically, it’s appropriate for people who did a little bit of statistics in college, he says. “It gets them into more modern-day applied statistical modeling and how to implement with software.” The MOOC has been taught twice, with nearly 40,000 people signing up each time, 20,000 showing up on day one, and about 3,000 to 4,000 completing each course. This is typical of MOOCs, Hastie says: “There’s a kind of exponential decay [in the number of students].” But the MOOC still reaches more people than is possible in a traditional in-person class.
Leskovec’s MOOC, which is offered through Coursera, introduces fundamental algorithms and techniques for dealing with very big data as well as how to apply these techniques efficiently at large scales. The course covers algorithms for extracting models and information from large datasets, including locality-sensitive hashing, clustering, decisions trees, and dimensionality reduction. It also introduces students to MapReduce, a software framework for easily writing applications that process vast amounts of data. Offered on Coursera, the MOOC had over 54,000 people visit the course, of which over 9,800 submitted at least one exercise.
Boyd’s Convex Optimization MOOC, on the Stanford OpenEdX platform, is for more advanced and mathematically-oriented students who want to get into the optimization game. It includes about 20 hours of lecture and some challenging problem sets with an applied focus. “You’ll learn just enough math, which by the way is not a small amount, to be able to do convex optimization in practical settings,” Boyd says in the online intro to the course.
While none of these MOOCs has a biomedical focus, their applicability is quite wide, Hastie says. “The kinds of methods we teach are used in biomedical computations all the time.” At the Mobilize Center, for example, statistical learning is used to analyze data from clinical databases to predict the outcomes of surgeries. And Leskovec is helping the Center mine massive datasets from mobile sensors to better understand patterns in physical activity.
Statistical Learning: https://statlearning.class.stanford.edu/
Mining Massive Datasets: https://www.coursera.org/course/mmds
Convex Optimization: https://www.class-central.com/mooc/1577/stanford-openedx-cvx101-convex-optimization
The Mobilize Center web site provides a list of other training resources, including videos from the 2015 Big Data in Medicine conference at Stanford. Go to http://mobilize.stanford.edu/training/