{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Train Models on Large Datasets\n", "==============================\n", "\n", "Most estimators in scikit-learn are designed to work with NumPy arrays or scipy sparse matricies.\n", "These data structures must fit in the RAM on a single machine.\n", "\n", "Estimators implemented in Dask-ML work well with Dask Arrays and DataFrames. This can be much larger than a single machine's RAM. They can be distributed in memory on a cluster of machines." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-01-14T10:51:00.125650Z", "iopub.status.busy": "2021-01-14T10:51:00.125106Z", "iopub.status.idle": "2021-01-14T10:51:00.751309Z", "shell.execute_reply": "2021-01-14T10:51:00.751973Z" } }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2021-01-14T10:51:00.757285Z", "iopub.status.busy": "2021-01-14T10:51:00.756878Z", "iopub.status.idle": "2021-01-14T10:51:02.653206Z", "shell.execute_reply": "2021-01-14T10:51:02.654409Z" } }, "outputs": [ { "data": { "text/html": [ "
\n",
"Client\n", "
| \n",
"\n",
"Cluster\n", "
| \n",
"
\n",
"
| \n",
"\n", "\n", " | \n", "