5 day course about Data Science with R – University of Copenhagen

Home > Courses > Data Science with R

Data Science with R

NB. The course is closed for entries.

Data is everywhere but it doesn’t generate value just by itself. So, would you like to get more value out of your data? Would you like to become an  efficient and well-structured data analyst?

Then you need to learn R– a programming language - and RStudio - a competitive and modern data science environment. With an unprecedented back catalogue of packages, R is extremely versatile, utilizing statistical methods, which can be applied to more or less any task. R is, moreover, suitable for automatizing laborious repetitive tasks, for making sure that your analyses are correct and reproducible, and for customizing your analyses to your needs.

What you will learn

You will learn to build a complete data analysis pipeline in R. This includes learning R programming techniques for:

  • Data import from multiple sources
  • Data manipulation and visualization
  • Modeling
  • Automatic and interactive report generation

In addition to the technical programming skills, you will also be introduced to a conceptual framework for data analysis, where all steps involved in data analysis are automatized via a programmatic pipeline.

Course Content

The course is based on RStudio and a collection of modern R packages. The main focus will be on learning to exploit the full potential of these tools, which can serve as an infrastructure for almost any conceivable data analysis in R. Generalized additive models will be treated as non-trivial examples of how to build a predictive regression model in R.

Core elements:

  • RStudio: An integrated development environment for R, which supports interactive data analysis, building of data analysis pipelines, and R software development
  • Tidyverse: A framework and collection of R packages centered on the concept of tidy data
  • Generalized additive models: A flexible but interpretable and easy-to-use prediction model
  • Visualization: High-quality figures created from structured specifications using the R package ggplot2
  • Reproducible analysis: Automatic and reproducible reports are written and generated using R Markdown
  •  Interactive communication: Reactive web-applications for interactive presentations of data and analyses written using Shiny

Other topics:

  • R as a programming language
  • Organisation of R code
  • Predictive modeling and model assessment with R

Participants

R is a programming language and the course takes a programmatic approach to data analysis. To get the full benefit from the course the participants should be interested in and willing to program. The statistical and mathematical prerequisites are modest, but participants should know about mean, variance and simple linear regression.

The course is for:

  • People with some experience in SAS, Matlab or Python programming for data analysis, but with no or limited experience with R
  • People with general programming and/or database experience, but limited or no experience with data analysis and data modeling
  • People with some R experience and an interest in learning RStudio, R Markdown, and Tidyverse

R (www.r-project.org) and RStudio (www.rstudio.com) are open source and available free of change. Participants are expected to bring a laptop with these programs installed.

Course dates

5 days, August 20 – 24, 2018, 9:00 – 16:30 at  the University of Copenhagen, South Campus.

Course directors

Niels Richard Hansen, Professor, Department of Mathematical Sciences, University of Copenhagen

Anders Tolver, Associate Professor, Department of Mathematical Sciences, University of Copenhagen

Teaching material 

Participants will receive a copy of the book R for Data Science (2016) by Garrett Grolemund and Hadley Wickham.

Course fee

EUR 2,680 / DKK 19,900 excl. Danish VAT. Fee includes teaching, course materials and all meals during the course.