BWSI Course - Medlytics

Data mining and machine learning have become ubiquitous in the age of “big data,” and for good reason: advanced learning algorithms take advantage of ever-growing compute capacity and vast amounts of data to solve complex problems that can often meet or exceed human ability.  These techniques are being embraced in nearly every sector, including financial trading, cybersecurity, entertainment, advertising, autonomous vehicles, and, of course, health and medicine.  The increasing adoption of electronic health records, mobile health apps, and wearable technologies continues to generate troves of rich, real-time, high-resolution data.  These data are now being used to train algorithms to help physicians build prognostic models, conduct medical image analysis, and improve diagnostic accuracy.

BWSI Medlytics program will offer students the opportunity to explore the exciting intersection of data science and medicine.  The program consists of two components: (1) online course from January to May, open to all interested and committed students, and (2) a four-week summer program for a group of 20–25 students from July 6–July 31.  The online course will help students build a solid foundation in the fundamentals of probability and statistics, and provide an introduction to coding and machine learning techniques through a series of online teaching modules.  During the summer, students will work in groups alongside Cambridge-area clinicians and data scientists to gain hands-on experience applying advanced machine learning and data mining to solve real-world medical challenges. 

Online Course

The online component for the BWSI Medlytics course contains important introductory material to provide students with the background required to successfully complete the four-week summer course.  In addition to the introductory material, the online course will expose students to real-world data and machine learning techniques, and introduce some of the challenges and opportunities of combining the two.

Introduction and Background

  • Perspectives on the challenges of working with medical data
  • Probability and statistics
  • Introduction to coding: Python, Git, Jupyter

Data Science for Health and Medicine

  • Defining a patient cohort
  • Correlation and regression; noise vs. outliers
  • Beginner machine learning: supervised and unsupervised algorithms
  • Introduction to time-series data analysis

Summer Course

The four-week summer component of Medlytics will take a deep dive into the application of data analytics to physiological signals and time-series data.  Daily course material, case studies, guest lectures, and small-group projects will expose students to challenges in signals analysis and some state-of-the-art machine learning solutions.  Boston-area clinicians and data scientists will mentor students as they compete in weekly challenges and participate in a final capstone project from concept proposal to live demonstration.

The following is a rough outline for the summer course:

Week 1: “First do no harm” (Introduction to Diagnostic Research and Machine Learning)

  • Research questions, hypotheses and objectives: the FINER criteria
  • Structured data processing and plotting in Python
  • Classification evaluation and metrics
  • Supervised machine learning
  • Clinical Data Challenge 1: Diagnosing Hypothyroidism

Week 2: “There is art to medicine as well as science” (Introduction to Signals Processing and Deep Learning)

  • Introduction to signals processing
  • Fourier transforms
  • Machine learning for time-series data
  • Artificial neural networks
  • Clinical Data Challenge 2: Classifying Sleep Stages

Week 3: “I will not be ashamed to say ‘I know not’” (Image Processing and Advanced Data Analytics)

  • Computer vision applications in medicine
  • Texture classification using convolutional neural networks
  • Transfer learning
  • Clinical Data Challenge 3: Analyzing Mammograms

Week 4: “Look for a path to a cure” (Capstone Project)

In the final week of the course, students will work in teams to propose, design, and demonstrate a health application prototype, leveraging the lessons learned from weeks 1- 3.