BIST Bio-Statistics Courses

By April 15, 2016June 21st, 2016BIST, Media

The Barcelona Institute of Science and Technology (BIST) is organizing the “BIST Bio-Statistics Courses” with the goal of giving an introduction to selected important topics in biostatistical concepts and reasoning to the BIST scientific community. The courses will be organized by the different BIST centers throughout the year.

The “BIST Bio-Statistics Courses” will be offered free of charge to the scientific community of the six BIST centers: the Centre for Genomic Regulation (CRG), the Institute of Photonic Sciences (ICFO), the Institute of Chemical Research of Catalonia (ICIQ), the Catalan Institute for Nanoscience and Nanotechnology (ICN2), the High Energy Physics Institute (IFAE) and the Institute for Research in Biomedicine (IRB Barcelona).

Introduction to BIST Bio-Statistics Course

Course Description: This introductory course to statistics and probability theory is modeled after the traditional university course Statistics 101 and will be given by the CRG staff and PhD students. The material is offered in 5 consecutive modules (please see Course Syllabus below), each containing a morning lecture and an afternoon practicum in a computer class. You can sign up for all modules in the sequence or for the selected ones. For practical exercises we will use R programming language and R Studio. However, this course is focused on statistics rather than R; therefore, each practicum is designed with the purpose to demonstrate and reinforce understanding of concepts introduced in the lecture rather than to provide a training in R. You can sign up for either or both the lecture and the practicum. The practicums (R scripts) will be available for download via the course website before the course start date.

Course Objectives: To introduce the basic concepts of statistics and probability and to demonstrate how they can be applied to real-life biological problems using R. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommend if the modules are not taken in a sequence.

Course Instructors: Dmitri Pervouchine (lectures), German Demidov, Ande Gohr, Enrique Vidal Ocabo, Sarah Bonnin, and Julia Ponomarenko.

Registration deadline: 22nd April 2016


Course Syllabus and Schedule
May 2016

Monday 2 May 2016
Title: Introduction to R
Times: 14:00 – 18:30
Venue: Blue Lecture Room, ICFO

Course Description: R is a programming language initially designed for statistical inference. Focused on the integration and handling of different types of data, it was developed under a free software initiative that allows individuals to design and share software packages (libraries) with a specific functionality. As developers have in mind the end user, packages are fully documented with working examples so their results are fully reproducible. In addition, the source code is available for other developers to modify or adopt for their own needs. R has been established as an online community where newest algorithms and research methods for analysis,datahandlinganddisplayaretriedbyagrowinguserbase. Rcommunitiesdeveloppackagesin mathematical statistics, computer science, text mining, geographical information systems, econometrics, bio-statistics and market research amongst many others. Recently R is used for development and deployment of mobile apps. Packages are developed so users and developers with no prior knowledge in; for instance, web-design, parallel computing or web-scraping can operate fully within R.

R is a powerful tool and should be promoted in physics. The benefits of using newest inference methods, current analysis of big data and motivating reproducible research and code sharing are obvious. In addition, encouraging students to learn R as part of their training can stimulate their creativity to design marketable analytical products or open the doors to novel career developments.

Workshop Format: This is a half-day workshop (2 sessions of 2 hours each). Students will need to bring their own laptop, and will have access to the course materials online before and during the session.

Friday 6 May

Module I. Descriptive statistics.
Theoretical part: AULA room 09:30-13:30
Practical part: BIOINFORMATIC room 14:30-17:00

Lecture I. Exploratory data analysis: bar-plot, histogram, CDF, box-plot, scatter-plot, pie charts etc. Samples, measures of center and spread, percentiles, odds ratio. Outliers and robustness. Experiment versus observational study, confounding factors, simple random sample, other types of sampling and biases in sampling techniques.
Lecture II.
Introduction to R programming language and R Studio: Data types, variables, packages, functions, handling files/scripts/projects.
Practicum: Basic plots in R using the ggplot2 package.

Monday 9 May 

Module II. Introduction to Probability.
Theoretical part: AULA room 09:30-13:30

Practical part: BIOINFORMATIC room 14:30-17:00

Lecture. Independence, conditional probability, Bayes formula. Distributions, population mean and population variance, Binomial, Poisson, and Normal distribution. Central Limit theorem and the Law of large numbers. Continuity correction. Sampling with and without replacement. Correction for finite population size.

Practicum. Elementary probability problems in R, pdf and cdf functions, simulation explicating the law of large numbers.

Friday 13 May 
Module III. Statistical Inference, part I.
Theoretical part: AULA room 09:30-13:30
Practical part: BIOINFORMATIC room 14:30-17:00

Statistical Inference, part I. The concept of hypothesis testing, type I and type II error, false discovery rate. Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests and confidence intervals. Sampling distribution, estimators, standard error. Normal probabilities in application to p-value. One-sample and two-sample tests for independent and matched samples with known variance.  The case of unknown variance and Student t-distribution, assumption of normality. Pooled variance and equal variances assumption.
Practicum. One- and two-sample tests with known and unknown variance, test for proportions, simulation involving confidence intervals and t-distribution.

Wednesday 18 May 
Module IV. Statistical Inference, part II. 
Theoretical part: AULA room 09:30-13:30
Practical part: BIOINFORMATIC room 14:30-17:00

Lecture. Statistical Inference, part II. Estimation of variance. Fisher test for variance equality. Non-parametric tests. Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test. Chi-square test for goodness of fit, chi-square test for independence. Kolmogorov-Smirnov (KS) test. Shapiro test for normality. Sample size estimation. Correction for multiple testing, family-wise error rate.
Practicum. Tests with unknown variance, non-parametric tests, simulations explicating non-parametric tests, FDR.

May 20, 2016
Module V. Statistical modeling, Regression.
Theoretical part: AULA room 09:30-13:30
Practical part: BIOINFORMATIC room 14:30-17:00

Simple linear regression model, residuals, degrees of freedom, least squares method, correlation coefficient, variance decomposition, determination coefficient. Interpretation of the slope, correlation, and determination coefficients. Standard error and statistical inference in simple linear regression model. Analysis of variance (ANOVA). One-way and two-way ANOVA.
Practicum. Problems on linear regression, ANOVA, data transformation.

October 2016 – TBC
Advanced Topics in Statistical Modeling T.B.C

Date: TBC

Get started on Statistical Concepts, Experimental Design and Exploratory Tools T.B.C