Data science and predictive analytics biomedical and health applications using R

Dinov, Ivo D; Ivo D. Dinov

Page

of 0

Data science and predictive analytics biomedical and health applications using R

, Book Dinov, Ivo D

Springer International Publishing AG 2018

Cataloging brief

Data science and predictive analytics biomedical and health applications using R

, Book Dinov, Ivo D

Springer International Publishing AG 2018

Find in content


sort by

Bookmark

Foreword (6)
Preface (10)
- Genesis (10)
- Purpose (11)
- Limitations/Prerequisites (11)
- Scope of the Book (12)
- Acknowledgements (13)
DSPA Application and Use Disclaimer (14)
- Biomedical, Biosocial, Environmental, and Health Disclaimer (15)
Notations (16)
Contents (17)
Chapter 1: Motivation (33)
- 1.1 DSPA Mission and Objectives (33)
- 1.2 Examples of Driving Motivational Problems and Challenges (34)
- 1.3 Common Characteristics of Big (Biomedical and Health) Data (40)
- 1.4 Data Science (41)
- 1.5 Predictive Analytics (41)
- 1.6 High-Throughput Big Data Analytics (42)
- 1.7 Examples of Data Repositories, Archives, and Services (42)
- 1.8 DSPA Expectations (43)
Chapter 2: Foundations of R (45)
- 2.1 Why Use R? (45)
- 2.2 Getting Started (47)
  - 2.2.1 Install Basic Shell-Based R (47)
  - 2.2.2 GUI Based R Invocation (RStudio) (47)
  - 2.2.3 RStudio GUI Layout (47)
  - 2.2.4 Some Notes (48)
- 2.3 Help (48)
- 2.4 Simple Wide-to-Long Data format Translation (49)
- 2.5 Data Generation (50)
- 2.6 Input/Output (I/O) (54)
- 2.7 Slicing and Extracting Data (56)
- 2.8 Variable Conversion (57)
- 2.9 Variable Information (57)
- 2.10 Data Selection and Manipulation (59)
- 2.11 Math Functions (62)
- 2.12 Matrix Operations (64)
- 2.13 Advanced Data Processing (64)
- 2.14 Strings (69)
- 2.15 Plotting (71)
- 2.16 QQ Normal Probability Plot (73)
- 2.17 Low-Level Plotting Commands (77)
- 2.18 Graphics Parameters (77)
- 2.19 Optimization and model Fitting (79)
- 2.20 Statistics (80)
- 2.21 Distributions (81)
  - 2.21.1 Programming (81)
- 2.22 Data Simulation Primer (82)
- 2.23 Appendix (88)
  - 2.23.1 HTML SOCR Data Import (88)
  - 2.23.2 R Debugging (89)
    - Example (92)
- 2.24 Assignments: 2. R Foundations (92)
  - 2.24.1 Confirm that You Have Installed R/RStudio (92)
  - 2.24.2 Long-to-Wide Data Format Translation (93)
  - 2.24.3 Data Frames (93)
  - 2.24.4 Data Stratification (93)
  - 2.24.5 Simulation (93)
  - 2.24.6 Programming (94)
- References (94)
Chapter 3: Managing Data in R (95)
- 3.1 Saving and Loading R Data Structures (95)
- 3.2 Importing and Saving Data from CSV Files (96)
- 3.3 Exploring the Structure of Data (98)
- 3.4 Exploring Numeric Variables (98)
- 3.5 Measuring the Central Tendency: Mean, Median, Mode (99)
- 3.6 Measuring Spread: Quartiles and the Five-Number Summary (100)
- 3.7 Visualizing Numeric Variables: Boxplots (102)
- 3.8 Visualizing Numeric Variables: Histograms (103)
- 3.9 Understanding Numeric Data: Uniform and Normal Distributions (104)
- 3.10 Measuring Spread: Variance and Standard Deviation (105)
- 3.11 Exploring Categorical Variables (108)
- 3.12 Exploring Relationships Between Variables (109)
- 3.13 Missing Data (111)
- 3.14 Parsing Webpages and Visualizing Tabular HTML Data (162)
- 3.15 Cohort-Rebalancing (for Imbalanced Groups) (167)
- 3.16 Appendix (170)
  - 3.16.1 Importing Data from SQL Databases (170)
  - 3.16.2 R Code Fragments (171)
- 3.17 Assignments: 3. Managing Data in R (172)
- References (173)
Chapter 4: Data Visualization (174)
- 4.1 Common Questions (174)
- 4.2 Classification of Visualization Methods (175)
- 4.3 Composition (175)
- 4.4 Comparison (183)
- 4.5 Relationships (202)
- 4.6 Appendix (214)
  - 4.6.1 Hands-on Activity (Health Behavior Risks) (214)
  - 4.6.2 Additional ggplot Examples (218)
- 4.7 Assignments 4: Data Visualization (229)
- References (230)
Chapter 5: Linear Algebra and Matrix Computing (231)
- 5.1 Matrices (Second Order Tensors) (232)
  - 5.1.1 Create Matrices (232)
  - 5.1.2 Adding Columns and Rows (233)
- 5.2 Matrix Subscripts (234)
- 5.3 Matrix Operations (234)
- 5.4 Matrix Algebra Notation (239)
- 5.5 Scalars, Vectors and Matrices (243)
  - 5.5.1 Sample Statistics (Mean, Variance) (245)
  - 5.5.2 Least Square Estimation (248)
    - The R lm Function (249)
- 5.6 Eigenvalues and Eigenvectors (249)
- 5.7 Other Important Functions (250)
- 5.8 Matrix Notation (Another View) (250)
- 5.9 Multivariate Linear Regression (254)
- 5.10 Sample Covariance Matrix (257)
- 5.11 Assignments: 5. Linear Algebra and Matrix Computing (259)
- References (261)
Chapter 6: Dimensionality Reduction (262)
- 6.1 Example: Reducing 2D to 1D (262)
- 6.2 Matrix Rotations (266)
- 6.3 Notation (271)
- 6.4 Summary (PCA vs. ICA vs. FA) (271)
- 6.5 Principal Component Analysis (PCA) (272)
  - 6.5.1 Principal Components (272)
- 6.6 Independent Component Analysis (ICA) (279)
- 6.7 Factor Analysis (FA) (283)
- 6.8 Singular Value Decomposition (SVD) (285)
- 6.9 SVD Summary (287)
- 6.10 Case Study for Dimension Reduction (Parkinson´s Disease) (287)
- 6.11 Assignments: 6. Dimensionality Reduction (294)
  - 6.11.1 Parkinson´s Disease Example (294)
  - 6.11.2 Allometric Relations in Plants Example (295)
    - Load Data (295)
    - Dimensionality Reduction (295)
- References (295)
Chapter 7: Lazy Learning: Classification Using Nearest Neighbors (296)
- 7.1 Motivation (297)
- 7.2 The kNN Algorithm Overview (298)
- 7.3 Case Study (300)
- 7.4 Assignments: 7. Lazy Learning: Classification Using Nearest Neighbors (315)
- References (316)
Chapter 8: Probabilistic Learning: Classification Using Naive Bayes (317)
- 8.1 Overview of the Naive Bayes Algorithm (317)
- 8.2 Assumptions (318)
- 8.3 Bayes Formula (318)
- 8.4 The Laplace Estimator (320)
- 8.5 Case Study: Head and Neck Cancer Medication (321)
- 8.6 Practice Problem (331)
- 8.7 Assignments 8: Probabilistic Learning: Classification Using Naive Bayes (332)
  - 8.7.1 Explain These Two Concepts (332)
  - 8.7.2 Analyzing Textual Data (333)
- References (333)
Chapter 9: Decision Tree Divide and Conquer Classification (334)
- 9.1 Motivation (334)
- 9.2 Hands-on Example: Iris Data (335)
- 9.3 Decision Tree Overview (337)
- 9.4 Case Study 1: Quality of Life and Chronic Disease (343)
- 9.5 Compare Different Impurity Indices (358)
- 9.6 Classification Rules (358)
- 9.7 Case Study 2: QoL in Chronic Disease (Take 2) (359)
- 9.8 Practice Problem (364)
- 9.9 Assignments 9: Decision Tree Divide and Conquer Classification (369)
  - 9.9.1 Explain These Concepts (369)
  - 9.9.2 Decision Tree Partitioning (369)
- References (370)
Chapter 10: Forecasting Numeric Data Using Regression Models (371)
- 10.1 Understanding Regression (371)
  - 10.1.1 Simple Linear Regression (371)
- 10.2 Ordinary Least Squares Estimation (373)
- 10.3 Case Study 1: Baseball Players (378)
- 10.4 Step 5: Improving Model Performance (387)
- 10.5 Understanding Regression Trees and Model Trees (399)
  - 10.5.1 Adding Regression to Trees (399)
- 10.6 Case Study 2: Baseball Players (Take 2) (400)
- 10.7 Practice Problem: Heart Attack Data (406)
- 10.8 Assignments: 10. Forecasting Numeric Data Using Regression Models (407)
- References (407)
Chapter 11: Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines (408)
- 11.1 Understanding Neural Networks (408)
- 11.2 Case Study 1: Google Trends and the Stock Market: Regression (413)
- 11.3 Simple NN Demo: Learning to Compute (419)
- 11.4 Case Study 2: Google Trends and the Stock Market - Classification (421)
- 11.5 Support Vector Machines (SVM) (423)
  - 11.5.1 Classification with Hyperplanes (424)
- 11.6 Case Study 3: Optical Character Recognition (OCR) (428)
- 11.7 Case Study 4: Iris Flowers (434)
- 11.8 Practice (441)
  - 11.8.1 Problem 1 Google Trends and the Stock Market (441)
  - 11.8.2 Problem 2: Quality of Life and Chronic Disease (441)
- 11.9 Appendix (445)
- 11.10 Assignments: 11. Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines (446)
  - 11.10.1 Learn and Predict a Power-Function (446)
  - 11.10.2 Pediatric Schizophrenia Study (446)
- References (447)
Chapter 12: Apriori Association Rules Learning (448)
- 12.1 Association Rules (448)
- 12.2 The Apriori Algorithm for Association Rule Learning (449)
- 12.3 Measuring Rule Importance by Using Support and Confidence (449)
- 12.4 Building a Set of Rules with the Apriori Principle (450)
- 12.5 A Toy Example (451)
- 12.6 Case Study 1: Head and Neck Cancer Medications (452)
- 12.7 Practice Problems: Groceries (463)
- 12.8 Summary (466)
- 12.9 Assignments: 12. Apriori Association Rules Learning (467)
- References (467)
Chapter 13: k-Means Clustering (468)
- 13.1 Clustering as a Machine Learning Task (468)
- 13.2 Silhouette Plots (471)
- 13.3 The k-Means Clustering Algorithm (472)
  - 13.3.1 Using Distance to Assign and Update Clusters (472)
  - 13.3.2 Choosing the Appropriate Number of Clusters (473)
- 13.4 Case Study 1: Divorce and Consequences on Young Adults (473)
- 13.5 Model Improvement (480)
  - 13.5.1 Tuning the Parameter k (482)
- 13.6 Case Study 2: Pediatric Trauma (484)
- 13.7 Hierarchical Clustering (492)
- 13.8 Gaussian Mixture Models (495)
- 13.9 Summary (497)
- 13.10 Assignments: 13. k-Means Clustering (497)
- References (498)
Chapter 14: Model Performance Assessment (499)
- 14.1 Measuring the Performance of Classification Methods (499)
- 14.2 Evaluation Strategies (501)
- 14.3 Visualizing Performance Tradeoffs (ROC Curve) (512)
- 14.4 Estimating Future Performance (Internal Statistical Validation) (515)
- 14.5 Assignment: 14. Evaluation of Model Performance (519)
- References (520)
Chapter 15: Improving Model Performance (521)
- 15.1 Improving Model Performance by Parameter Tuning (521)
- 15.2 Using caret for Automated Parameter Tuning (521)
- 15.3 Assignment: 15. Improving Model Performance (534)
  - 15.3.1 Model Improvement Case Study (535)
- References (535)
Chapter 16: Specialized Machine Learning Topics (536)
- 16.1 Working with Specialized Data and Databases (536)
- 16.2 Working with Domain-Specific Data (550)
  - 16.2.1 Working with Bioinformatics Data (550)
  - 16.2.2 Visualizing Network Data (551)
- 16.3 Data Streaming (556)
- 16.4 Optimization and Improving the Computational Performance (569)
- 16.5 Parallel Computing (572)
- 16.6 Deploying Optimized Learning Algorithms (576)
- 16.7 Practice Problem (577)
- 16.8 Assignment: 16. Specialized Machine Learning Topics (578)
- References (579)
Chapter 17: Variable/Feature Selection (580)
- 17.1 Feature Selection Methods (580)
- 17.2 Case Study: ALS (582)
- 17.3 Practice Problem (592)
- 17.4 Assignment: 17. Variable/Feature Selection (594)
  - 17.4.1 Wrapper Feature Selection (594)
  - 17.4.2 Use the PPMI Dataset (594)
- References (595)
Chapter 18: Regularized Linear Modeling and Controlled Variable Selection (596)
- 18.1 Questions (597)
- 18.2 Matrix Notation (597)
- 18.3 Regularized Linear Modeling (597)
- 18.4 Linear Regression (605)
- 18.5 Regularization Framework (609)
- 18.6 Implementation of Regularization (611)
- 18.7 Knock-off Filtering: Simulated Example (628)
  - 18.7.1 Notes (630)
- 18.8 PD Neuroimaging-Genetics Case-Study (631)
- 18.9 Assignment: 18. Regularized Linear Modeling and Knockoff Filtering (644)
- References (645)
Chapter 19: Big Longitudinal Data Analysis (646)
- 19.1 Time Series Analysis (646)
- 19.2 Structural Equation Modeling (SEM)-Latent Variables (661)
- 19.3 Longitudinal Data Analysis-Linear Mixed Models (671)
  - 19.3.1 Mean Trend (671)
  - 19.3.2 Modeling the Correlation (675)
- 19.4 GLMM/GEE Longitudinal Data Analysis (676)
  - 19.4.1 GEE Versus GLMM (678)
- 19.5 Assignment: 19. Big Longitudinal Data Analysis (680)
- References (681)
Chapter 20: Natural Language Processing/Text Mining (682)
- 20.1 A Simple NLP/TM Example (683)
- 20.2 Case-Study: Job Ranking (692)
- 20.3 TF-IDF (699)
- 20.4 Cosine Similarity (708)
- 20.5 Sentiment Analysis (709)
- 20.6 Assignment: 20. Natural Language Processing/Text Mining (717)
  - 20.6.1 Mining Twitter Data (717)
  - 20.6.2 Mining Cancer Clinical Notes (718)
- References (718)
Chapter 21: Prediction and Internal Statistical Cross Validation (719)
- 21.1 Forecasting Types and Assessment Approaches (719)
- 21.2 Overfitting (720)
- 21.3 Internal Statistical Cross-Validation is an Iterative Process (723)
- 21.4 Example (Linear Regression) (724)
- 21.5 Case-Studies (726)
- 21.6 Summary of CV output (734)
- 21.7 Alternative Predictor Functions (734)
- 21.8 Compare the Results (752)
- 21.9 Assignment: 21. Prediction and Internal Statistical Cross-Validation (755)
- References (756)
Chapter 22: Function Optimization (757)
- 22.1 Free (Unconstrained) Optimization (757)
- 22.2 Constrained Optimization (762)
- 22.3 General Non-linear Optimization (770)
  - 22.3.1 Dual Problem Optimization (771)
- 22.4 Manual Versus Automated Lagrange Multiplier Optimization (775)
- 22.5 Data Denoising (778)
- 22.6 Assignment: 22. Function Optimization (783)
- References (785)
Chapter 23: Deep Learning, Neural Networks (786)
- 23.1 Deep Learning Training (787)
  - 23.1.1 Perceptrons (787)
- 23.2 Biological Relevance (789)
- 23.3 Simple Neural Net Examples (791)
- 23.4 Classification (794)
  - 23.4.1 Sonar Data Example (795)
  - 23.4.2 MXNet Notes (802)
- 23.5 Case-Studies (803)
- 23.6 Classifying Real-World Images (827)
- 23.7 Assignment: 23. Deep Learning, Neural Networks (837)
- References (838)
Summary (839)
Glossary (842)
Index (844)