General Framework for Topics

Students are free (and encouraged) to select a topic of their choosing. Possible topics include, but are not limited too:

  • Something New: Finding an interesting new data set and doing end-to-end analysis; possible parts are getting the data, cleaning data, modeling, visualization, analysis, writeup, … and more; this can take inspiration from another class: maybe ‘learning’ from STAT 432, or some C++ from CS 225, or anything else you find challenging.
  • Transcription: Converting from an existing implementation into another one: i.e. something currently done in Python Pandas translated to R, or something done in SQL brought to R in combination of R and SQL, or something done in R made faster by using R and C++, or …

  • Replication: A key scientific approach is to validate (or, as it may be invalidate) other findings. Take something published that it replicable (i.e. runs with posted code and data) and reproduce it (i.e. with new code, maybe using a different paradigm or framework) is a valid topic.

  • Improvements: Taking an existing package and addressing one or more non-trivial open bug reports or ‘issues’ or ‘tickets’: many packages and projects have open issue tickets (though it takes some experience to identify actually addressable ones).

  • (Rigorous) Benchmarking: New methods and alternate approaches appear all the time. Can you shed light on an existing application and demonstrate that a competing implementation is in fact faster on meaningful selection of inputs? This could be extended to a review regarding implementation deatils, aspects of maintainability or more.

  • Packaging: Turning an interesting application (in, say, R) that is not currently a package into a proper package (with documentation, tests, etc) satisfying proper packaging standards (as e.g. defined by CRAN).

  • Contribute to e.g. LOST: The Library of Statistical Techniques is still growing, maybe a team can work of filling an existing gap here (with a tip of the hat to Grant McDermott and his EC607 at Oregon for this idea).

  • Your proposal here: This is meant to be student-lead in a 400-level course, so propose something you are excited about!

Do not take this list as prescriptive or exhaustive: you are free to set your own agenda. Feel free to bring in other tools provided you can drive them from R. Machine Learning packages are fine provided they install / run in RStudio Server on our machine (so be mindful of its resouces: truly huge datasets may not be a great idea).

The reference platform is RStudio Server which we are using it throughout the course.


  • Do not be boring and do not just go to Kaggle and copy an existing story.

  • Do not cheat (see Syllabus).