In just under a week, we will officially launch a new major, in Data Science, at Luther College. I've spent a pretty good part of my summer preparing the first two classes of that major, and as I look back on it its been an interesting experience.
Data Science is a new field, and as such there are no "standard textbooks". finding a Data Science 101 text is impossible. There are hundreds of books on Data Science, and all of take their own approach. Many of them are much more oriented towards working professionals with years of statistics and/or programming in their background. Others are oriented much more towards a specific field either statistics or computer science. We are aiming towards a middle ground. Territory that is not well charted. We are aiming our data science program at undergraduate students at a Liberal Arts college.
In order to understand our program, I want to make our learning goals clear.
- Analytical Thinking: Students can think about, examine, and explain the relationships that may exist between and within sets of data. These relationships may include such things as patterns, cause and effect, similarities and differences, and trends.
- Communication: Students can communicate the results of their work using appropriate techniques to tell the most compelling story. These include oral presentation, visualization, and the written word.
- Problem Solving: Students can apply appropriate strategies to find, conceptualize, and implement a step by step solution. In addition, they can acquire and learn new tools and techniques in order to adapt to changes in technology and scale.
- Collaboration: Students can participate in shared responsibility and appreciation for understanding how problems can have an impact within and across disciplines. They can work with others in a cooperative manner to produce meaningful results.
These goals are translated into the two courses I'm teaching this fall in the following ways. First in our DS 120 Introduction to Data Science course:
- Introduce some of the key analysis techniques used in Data Science
- Develop the practical skill of working with Excel
- Learn to think about a data set and generate a list of questions for further analysis
- Practice the art of presenting your analysis and findings in writing, verbally, and graphically
- Read and Discuss case studies of other people's analysis
- Learn to consider the Ethical dilemmas practictioners of data science face
Data Science 120 will be a very breadth first class. We'll look at techniques for summarizing and presenting data graphically mostly with excel and Tableau. We'll look at some machine learning techniques that we can implement in excel.
Then in DS 320 Data Analysis and visualization
- To become a skeptical consumer of data, statistics, and visualizations
- Develop an understanding of what makes a good Visualization
- To practice making and presenting several different kinds of data visualizations
- To become a competent "data munger"
- To explore several of the tools commonly used in exploratory data analysis:
- Jupyter Notebooks and friends such as Pandas, Bokeh, Altair and other
- R Studio
- To put into practice some "statistics for hackers" techniques for validating claims that people make about their data.
Data Science 320 will go into a lot more depth on the analysis and visualization side. We have another class for the spring that will go in depth on applied machine learning.
So, how do you prepare to teach this stuff? With no existing textbooks that map directly onto our curriculum it has been quite a summer! At times I feel like I have veered wildly across the curriculum map. At other times I have felt at peace. In the end, I know that it will be fine, and that I'll learn a lot from this first time around, and as in all software projects, I'll iterate and make it better the next time.
Thankful for Helpful Alumni
First, I want to thank the many alumni that have taken the time to reach out to me to discuss data science. Its great to see the excitement with which our alumni are embracing this new major. I've had many great discussions with people about data science in practice. And lots of interest in future internships for our new majors. Many of the ideas I have about this fall come directly from these conversations.
It's all about the Data
Second, its all about the data! duh! Its data science, what else would it be about? But really one of the most interesting parts of the summer has been to find and explore new data sets. When you start to look you find out there are some really good one's out there. And I'm sure I've missed a lot of other great sources of data.
In my ideal classroom for these courses much of the learning would happen in the following way: I found this cool dataset on X where X changes frequently enough to engage everyone. Lets look at it and see if we can munge the data into shape to read with one our tool of the day. OK, lets brainstorm some questions about this data. Then how can we answer these questions? How do we present the answer to these questions in a ethical and compelling story.
Here are a few sources that I've run across.
Strange how Lifelong Learning Works
One of the most interesting things, looking back on the summer, is the question "How have I leared enough to start the semester?" I did find a couple of books especially compelling. The work of Tufte was a real inspiration in preparing for the data visualization course. But largely it has been online learning. I've got a nice list of podcasts that I enjoy listening to in the morning and on my bike rides. When I can get in a workout and count that time as class prep I feel like a productivity superhero.
I'm a big fan of the Flipboard app on my iPad and I've got a list of Flipboard channels on data science that that I've been reading and clipping. Many of these articles have already been incorporated into the reading list for both courses.
I have also dipped my toe into a few coursera courses on machine learning, but those are less urgent since ML isn't until second semester.
- Data Skeptic
- Not so standard Deviations
- Data Stories
- Partially Derivative
- Whats the Point?
- 99% Invisible -- not so much about data science but really interesting stories
I feel like these have all really helped me get a better feel for the field, and the notes from many of these have lead me to really interesting academic conference papers or journal articles. None of these are ways that I would have used to prepare for class 13 years ago when I started as a new professor. And as a professor at a college that preaches lifelong learning I think it is interesting that we often recoil at embracing the online world, yet I would guess I am not alone in embracing this technology for my own betterment.