Loading...
- Type of Document: Book
- Publisher: Switzerland : Springer International Publishing AG , 2018
- Keywords:
- Big data ; Mathematical statistics ; Medical records -- Data processing ; R (Computer program language)
- Foreword
- Preface
- DSPA Application and Use Disclaimer
- Notations
- Contents
- Chapter 1: Motivation
- 1.1 DSPA Mission and Objectives
- 1.2 Examples of Driving Motivational Problems and Challenges
- 1.3 Common Characteristics of Big (Biomedical and Health) Data
- 1.4 Data Science
- 1.5 Predictive Analytics
- 1.6 High-Throughput Big Data Analytics
- 1.7 Examples of Data Repositories, Archives, and Services
- 1.8 DSPA Expectations
- Chapter 2: Foundations of R
- 2.1 Why Use R?
- 2.2 Getting Started
- 2.3 Help
- 2.4 Simple Wide-to-Long Data format Translation
- 2.5 Data Generation
- 2.6 Input/Output (I/O)
- 2.7 Slicing and Extracting Data
- 2.8 Variable Conversion
- 2.9 Variable Information
- 2.10 Data Selection and Manipulation
- 2.11 Math Functions
- 2.12 Matrix Operations
- 2.13 Advanced Data Processing
- 2.14 Strings
- 2.15 Plotting
- 2.16 QQ Normal Probability Plot
- 2.17 Low-Level Plotting Commands
- 2.18 Graphics Parameters
- 2.19 Optimization and model Fitting
- 2.20 Statistics
- 2.21 Distributions
- 2.22 Data Simulation Primer
- 2.23 Appendix
- 2.24 Assignments: 2. R Foundations
- References
- Chapter 3: Managing Data in R
- 3.1 Saving and Loading R Data Structures
- 3.2 Importing and Saving Data from CSV Files
- 3.3 Exploring the Structure of Data
- 3.4 Exploring Numeric Variables
- 3.5 Measuring the Central Tendency: Mean, Median, Mode
- 3.6 Measuring Spread: Quartiles and the Five-Number Summary
- 3.7 Visualizing Numeric Variables: Boxplots
- 3.8 Visualizing Numeric Variables: Histograms
- 3.9 Understanding Numeric Data: Uniform and Normal Distributions
- 3.10 Measuring Spread: Variance and Standard Deviation
- 3.11 Exploring Categorical Variables
- 3.12 Exploring Relationships Between Variables
- 3.13 Missing Data
- 3.14 Parsing Webpages and Visualizing Tabular HTML Data
- 3.15 Cohort-Rebalancing (for Imbalanced Groups)
- 3.16 Appendix
- 3.17 Assignments: 3. Managing Data in R
- References
- Chapter 4: Data Visualization
- Chapter 5: Linear Algebra and Matrix Computing
- 5.1 Matrices (Second Order Tensors)
- 5.2 Matrix Subscripts
- 5.3 Matrix Operations
- 5.4 Matrix Algebra Notation
- 5.5 Scalars, Vectors and Matrices
- 5.6 Eigenvalues and Eigenvectors
- 5.7 Other Important Functions
- 5.8 Matrix Notation (Another View)
- 5.9 Multivariate Linear Regression
- 5.10 Sample Covariance Matrix
- 5.11 Assignments: 5. Linear Algebra and Matrix Computing
- References
- Chapter 6: Dimensionality Reduction
- 6.1 Example: Reducing 2D to 1D
- 6.2 Matrix Rotations
- 6.3 Notation
- 6.4 Summary (PCA vs. ICA vs. FA)
- 6.5 Principal Component Analysis (PCA)
- 6.6 Independent Component Analysis (ICA)
- 6.7 Factor Analysis (FA)
- 6.8 Singular Value Decomposition (SVD)
- 6.9 SVD Summary
- 6.10 Case Study for Dimension Reduction (Parkinson´s Disease)
- 6.11 Assignments: 6. Dimensionality Reduction
- References
- Chapter 7: Lazy Learning: Classification Using Nearest Neighbors
- 7.1 Motivation
- 7.2 The kNN Algorithm Overview
- 7.3 Case Study
- 7.3.1 Step 1: Collecting Data
- 7.3.2 Step 2: Exploring and Preparing the Data
- 7.3.3 Normalizing Data
- 7.3.4 Data Preparation: Creating Training and Testing Datasets
- 7.3.5 Step 3: Training a Model On the Data
- 7.3.6 Step 4: Evaluating Model Performance
- 7.3.7 Step 5: Improving Model Performance
- 7.3.8 Testing Alternative Values of k
- 7.3.9 Quantitative Assessment (Tables 7.2 and 7.3)
- 7.4 Assignments: 7. Lazy Learning: Classification Using Nearest Neighbors
- References
- Chapter 8: Probabilistic Learning: Classification Using Naive Bayes
- 8.1 Overview of the Naive Bayes Algorithm
- 8.2 Assumptions
- 8.3 Bayes Formula
- 8.4 The Laplace Estimator
- 8.5 Case Study: Head and Neck Cancer Medication
- 8.6 Practice Problem
- 8.7 Assignments 8: Probabilistic Learning: Classification Using Naive Bayes
- References
- Chapter 9: Decision Tree Divide and Conquer Classification
- 9.1 Motivation
- 9.2 Hands-on Example: Iris Data
- 9.3 Decision Tree Overview
- 9.4 Case Study 1: Quality of Life and Chronic Disease
- 9.5 Compare Different Impurity Indices
- 9.6 Classification Rules
- 9.7 Case Study 2: QoL in Chronic Disease (Take 2)
- 9.8 Practice Problem
- 9.9 Assignments 9: Decision Tree Divide and Conquer Classification
- References
- Chapter 10: Forecasting Numeric Data Using Regression Models
- 10.1 Understanding Regression
- 10.2 Ordinary Least Squares Estimation
- 10.3 Case Study 1: Baseball Players
- 10.3.1 Step 1: Collecting Data
- 10.3.2 Step 2: Exploring and Preparing the Data
- 10.3.3 Exploring Relationships Among Features: The Correlation Matrix
- 10.3.4 Visualizing Relationships Among Features: The Scatterplot Matrix
- 10.3.5 Step 3: Training a Model on the Data
- 10.3.6 Step 4: Evaluating Model Performance
- 10.4 Step 5: Improving Model Performance
- 10.5 Understanding Regression Trees and Model Trees
- 10.6 Case Study 2: Baseball Players (Take 2)
- 10.7 Practice Problem: Heart Attack Data
- 10.8 Assignments: 10. Forecasting Numeric Data Using Regression Models
- References
- Chapter 11: Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines
- 11.1 Understanding Neural Networks
- 11.2 Case Study 1: Google Trends and the Stock Market: Regression
- 11.3 Simple NN Demo: Learning to Compute
- 11.4 Case Study 2: Google Trends and the Stock Market - Classification
- 11.5 Support Vector Machines (SVM)
- 11.6 Case Study 3: Optical Character Recognition (OCR)
- 11.7 Case Study 4: Iris Flowers
- 11.8 Practice
- 11.9 Appendix
- 11.10 Assignments: 11. Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines
- References
- Chapter 12: Apriori Association Rules Learning
- 12.1 Association Rules
- 12.2 The Apriori Algorithm for Association Rule Learning
- 12.3 Measuring Rule Importance by Using Support and Confidence
- 12.4 Building a Set of Rules with the Apriori Principle
- 12.5 A Toy Example
- 12.6 Case Study 1: Head and Neck Cancer Medications
- 12.7 Practice Problems: Groceries
- 12.8 Summary
- 12.9 Assignments: 12. Apriori Association Rules Learning
- References
- Chapter 13: k-Means Clustering
- 13.1 Clustering as a Machine Learning Task
- 13.2 Silhouette Plots
- 13.3 The k-Means Clustering Algorithm
- 13.4 Case Study 1: Divorce and Consequences on Young Adults
- 13.5 Model Improvement
- 13.6 Case Study 2: Pediatric Trauma
- 13.7 Hierarchical Clustering
- 13.8 Gaussian Mixture Models
- 13.9 Summary
- 13.10 Assignments: 13. k-Means Clustering
- References
- Chapter 14: Model Performance Assessment
- 14.1 Measuring the Performance of Classification Methods
- 14.2 Evaluation Strategies
- 14.3 Visualizing Performance Tradeoffs (ROC Curve)
- 14.4 Estimating Future Performance (Internal Statistical Validation)
- 14.5 Assignment: 14. Evaluation of Model Performance
- References
- Chapter 15: Improving Model Performance
- Chapter 16: Specialized Machine Learning Topics
- 16.1 Working with Specialized Data and Databases
- 16.1.1 Data Format Conversion
- 16.1.2 Querying Data in SQL Databases
- 16.1.3 Real Random Number Generation
- 16.1.4 Downloading the Complete Text of Web Pages
- 16.1.5 Reading and Writing XML with the XML Package
- 16.1.6 Web-Page Data Scraping
- 16.1.7 Parsing JSON from Web APIs
- 16.1.8 Reading and Writing Microsoft Excel Spreadsheets Using XLSX
- 16.2 Working with Domain-Specific Data
- 16.3 Data Streaming
- 16.3.1 Definition
- 16.3.2 The stream Package
- 16.3.3 Synthetic Example: Random Gaussian Stream
- 16.3.4 Sources of Data Streams
- 16.3.5 Printing, Plotting and Saving Streams
- 16.3.6 Stream Animation
- 16.3.7 Case-Study: SOCR Knee Pain Data
- 16.3.8 Data Stream Clustering and Classification (DSC)
- 16.3.9 Evaluation of Data Stream Clustering
- 16.4 Optimization and Improving the Computational Performance
- 16.5 Parallel Computing
- 16.6 Deploying Optimized Learning Algorithms
- 16.7 Practice Problem
- 16.8 Assignment: 16. Specialized Machine Learning Topics
- References
- 16.1 Working with Specialized Data and Databases
- Chapter 17: Variable/Feature Selection
- Chapter 18: Regularized Linear Modeling and Controlled Variable Selection
- 18.1 Questions
- 18.2 Matrix Notation
- 18.3 Regularized Linear Modeling
- 18.4 Linear Regression
- 18.5 Regularization Framework
- 18.6 Implementation of Regularization
- 18.6.1 Example: Neuroimaging-Genetics Study of Parkinson´s Disease Dataset
- 18.6.2 Computational Complexity
- 18.6.3 LASSO and Ridge Solution Paths
- 18.6.4 Choice of the Regularization Parameter
- 18.6.5 Cross Validation Motivation
- 18.6.6 n-Fold Cross Validation
- 18.6.7 LASSO 10-Fold Cross Validation
- 18.6.8 Stepwise OLS (Ordinary Least Squares)
- 18.6.9 Final Models
- 18.6.10 Model Performance
- 18.6.11 Comparing Selected Features
- 18.6.12 Summary
- 18.7 Knock-off Filtering: Simulated Example
- 18.8 PD Neuroimaging-Genetics Case-Study
- 18.9 Assignment: 18. Regularized Linear Modeling and Knockoff Filtering
- References
- Chapter 19: Big Longitudinal Data Analysis
- Chapter 20: Natural Language Processing/Text Mining
- Chapter 21: Prediction and Internal Statistical Cross Validation
- 21.1 Forecasting Types and Assessment Approaches
- 21.2 Overfitting
- 21.3 Internal Statistical Cross-Validation is an Iterative Process
- 21.4 Example (Linear Regression)
- 21.5 Case-Studies
- 21.6 Summary of CV output
- 21.7 Alternative Predictor Functions
- 21.8 Compare the Results
- 21.9 Assignment: 21. Prediction and Internal Statistical Cross-Validation
- References
- Chapter 22: Function Optimization
- Chapter 23: Deep Learning, Neural Networks
- Summary
- Glossary
- Index