Artificial Intelligence x Generalization

## Tags - Part of: - Related: - Includes: - Additional: ## Definitions - [[Artificial Intelligence]] x [[Generalization]] ## Main resources - <iframe src="https://en.wikipedia.org/wiki/" allow="fullscreen" allowfullscreen="" style="height:100%;width:100%; aspect-ratio: 16 / 5; "></iframe> ## Landscapes written by AI (may include factually incorrect information) Generalization in Artificial Intelligence ├── Definition │ ├── Ability of AI models to perform well on unseen data │ ├── Crucial for real-world applications │ └── Generalization Gap │ ├── Difference between training and test performance │ └── Indicates model's ability to generalize ├── Challenges │ ├── Dataset Shift │ │ ├── Covariate Shift │ │ │ ├── Change in input distribution │ │ │ └── P(X) changes, P(Y|X) remains the same │ │ ├── Prior Probability Shift │ │ │ ├── Change in output distribution │ │ │ └── P(Y) changes, P(X|Y) remains the same │ │ └── Concept Shift │ │ ├── Change in relationship between input and output │ │ └── P(Y|X) changes │ ├── Overfitting │ │ ├── Memorization of training data │ │ ├── Poor performance on new data │ │ ├── High variance, low bias │ │ └── Causes │ │ ├── Complex models │ │ ├── Insufficient regularization │ │ └── Limited training data │ └── Underfitting │ ├── Oversimplified model │ ├── High bias, low variance │ └── Causes │ ├── Overly simple models │ ├── Insufficient training │ └── Inadequate model capacity ├── Techniques for Improving Generalization │ ├── Regularization │ │ ├── L1 (Lasso) Regularization │ │ │ ├── Adds L1 penalty to loss function │ │ │ └── Encourages sparse weights │ │ ├── L2 (Ridge) Regularization │ │ │ ├── Adds L2 penalty to loss function │ │ │ └── Encourages small weights │ │ └── Dropout │ │ ├── Randomly drops units during training │ │ └── Prevents co-adaptation of features │ ├── Cross-Validation │ │ ├── K-Fold Cross-Validation │ │ │ ├── Divides data into K folds │ │ │ └── Trains and validates on different folds │ │ ├── Leave-One-Out Cross-Validation │ │ │ ├── Uses single example as validation set │ │ │ └── Repeated for each example │ │ └── Stratified K-Fold Cross-Validation │ │ ├── Preserves class distribution in each fold │ │ └── Useful for imbalanced datasets │ ├── Data Augmentation │ │ ├── Image Transformations │ │ │ ├── Rotation │ │ │ ├── Flipping │ │ │ ├── Cropping │ │ │ ├── Color jittering │ │ │ └── Elastic deformations │ │ ├── Text Augmentation │ │ │ ├── Synonym replacement │ │ │ ├── Random insertion, swap, deletion │ │ │ └── Back-translation │ │ └── Audio Augmentation │ │ ├── Noise addition │ │ ├── Pitch shifting │ │ ├── Time stretching │ │ └── Reverberation │ ├── Transfer Learning │ │ ├── Pre-trained Models │ │ │ ├── Leverage knowledge from large datasets │ │ │ └── Examples: ResNet, BERT, GPT │ │ ├── Fine-tuning │ │ │ ├── Adapt pre-trained models to new tasks │ │ │ └── Requires less data and training time │ │ └── [[Domain Adaptation]] │ │ ├── Transfer knowledge across domains │ │ ├── Unsupervised Domain Adaptation │ │ └── Adversarial Domain Adaptation │ ├── Ensemble Methods │ │ ├── Bagging │ │ │ ├── Bootstrap Aggregating │ │ │ └── Trains models on subsets of data │ │ ├── Boosting │ │ │ ├── AdaBoost │ │ │ ├── Gradient Boosting │ │ │ └── XGBoost │ │ └── Stacking │ │ ├── Combines multiple models │ │ └── Uses meta-model to learn combination │ └── Attention Mechanisms │ ├── Self-Attention │ │ ├── Relates different positions of a sequence │ │ └── Basis for Transformers │ ├── Multi-Head Attention │ │ ├── Parallel self-attention layers │ │ └── Captures different relationships │ └── Transformer Architecture │ ├── Encoder-Decoder structure │ ├── Attention is all you need │ └── Examples: BERT, GPT, T5 ├── Evaluation Metrics │ ├── Held-Out Test Set │ │ ├── Measures performance on unseen data │ │ └── Prevents information leakage from training │ ├── [[Cross-Validation]] Scores │ │ ├── Average performance across folds │ │ └── More robust estimate of generalization │ └── Domain-Specific Metrics │ ├── F1 score, BLEU, ROUGE for NLP │ ├── mAP, IoU for object detection │ └── WER, CER for speech recognition ├── Generalization in Different AI Domains │ ├── Computer Vision │ │ ├── Object Detection │ │ │ ├── Localize and classify objects │ │ │ └── Challenges: scale, occlusion, viewpoint │ │ ├── Image Segmentation │ │ │ ├── Pixel-level classification │ │ │ └── Challenges: complex scenes, object boundaries │ │ └── Facial Recognition │ │ ├── Identify individuals from images │ │ └── Challenges: pose, illumination, expression │ ├── Natural Language Processing │ │ ├── Text Classification │ │ │ ├── Assign categories to text │ │ │ └── Challenges: ambiguity, sarcasm, context │ │ ├── Named Entity Recognition │ │ │ ├── Identify and classify named entities │ │ │ └── Challenges: entity boundary, nested entities │ │ └── Machine Translation │ │ ├── Translate text across languages │ │ └── Challenges: ambiguity, idioms, cultural differences │ ├── Speech Recognition │ │ ├── Acoustic Modeling │ │ │ ├── Map acoustic features to phonemes │ │ │ └── Challenges: noise, accents, speaker variability │ │ ├── Language Modeling │ │ │ ├── Predict next word in a sequence │ │ │ └── Challenges: long-range dependencies, rare words │ │ └── Speaker Adaptation │ │ ├── Adapt models to specific speakers │ │ └── Challenges: limited speaker data, accent variations │ └── Reinforcement Learning │ ├── Sim-to-Real Transfer │ │ ├── Transfer policies from simulation to real world │ │ └── Challenges: domain gap, physical constraints │ ├── Domain Randomization │ │ ├── Randomize simulation parameters │ │ └── Improve robustness to domain variations │ └── [[Meta-learning]] │ ├── Learn to learn from multiple tasks │ └── Adapt quickly to new tasks └── Future Directions ├── [[Causal Inference]] for Generalization │ ├── Learn causal relationships │ └── Improve robustness to distribution shifts ├── Invariant Risk Minimization │ ├── Learn invariant predictors across environments │ └── Generalize to unseen environments ├── [[Out-of-Distribution Generalization]] │ ├── Perform well on data outside training distribution │ └── Detect and handle distribution shifts ├── [[Continual Learning]] │ ├── Learn continuously from new data │ └── Avoid catastrophic forgetting ├── [[Few-Shot Learning]] │ ├── Learn from limited examples │ └── Leverage prior knowledge and meta-learning ├── Unsupervised and Self-Supervised Learning │ ├── Learn from unlabeled data │ └── Improve generalization through pre-training └── Explainable and Interpretable AI [[Explainable artificial intelligence]] ├── Understand and explain model decisions └── Enhance trust and accountability in AI systems