Generalized Linear Models

Live Notebook

You can run this notebook in a live session or view it on Github.

Generalized Linear Models¶

This notebook introduces the algorithms within Dask-GLM for Generalized Linear Models.

Start Dask Client for Dashboard¶

Starting the Dask Client is optional. It will provide a dashboard which is useful to gain insight on the computation.

The link to the dashboard will become visible when you create the client below. We recommend having it open on one side of your screen while using your notebook on the other side. This can take some effort to arrange your windows, but seeing them both at the same is very useful when learning.

[1]:

from dask.distributed import Client, progress
client = Client(processes=False, threads_per_worker=4,
                n_workers=1, memory_limit='2GB')
client

[1]:

Client

Client-5b571f45-0de1-11ed-a361-000d3a8f7959

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: http://10.1.1.64:8787/status

Cluster Info

LocalCluster

94f79b63

Dashboard: http://10.1.1.64:8787/status	Workers: 1
Total threads: 4	Total memory: 1.86 GiB
Status: running	Using processes: False

Scheduler Info

Scheduler

Scheduler-079365e3-ef5b-4539-84b1-973599540812

Comm: inproc://10.1.1.64/9057/1	Workers: 1
Dashboard: http://10.1.1.64:8787/status	Total threads: 4
Started: Just now	Total memory: 1.86 GiB

Workers

Worker: 0

Comm: inproc://10.1.1.64/9057/4	Total threads: 4
Dashboard: http://10.1.1.64:39345/status	Memory: 1.86 GiB
Nanny: None
Local directory: /home/runner/work/dask-examples/dask-examples/machine-learning/dask-worker-space/worker-ptl8ho2_

Make a random dataset¶

[2]:

from dask_glm.datasets import make_regression
X, y = make_regression(n_samples=200000, n_features=100, n_informative=5, chunksize=10000)
X

[2]:

	Array	Chunk
Bytes	152.59 MiB	7.63 MiB
Shape	(200000, 100)	(10000, 100)
Count	20 Tasks	20 Chunks
Type	float64	numpy.ndarray

[3]:

import dask
X, y = dask.persist(X, y)

Solve with a GLM algorithm¶

We also recommend looking at the “Graph” dashboard during execution if available

[4]:

import dask_glm.algorithms

b = dask_glm.algorithms.admm(X, y, max_iter=5)

Solve with a difference GLM algorithm¶

[5]:

b = dask_glm.algorithms.proximal_grad(X, y, max_iter=5)

/usr/share/miniconda3/envs/dask-examples/lib/python3.9/site-packages/dask/core.py:119: RuntimeWarning: overflow encountered in exp
  return func(*(_execute_task(a, cache) for a in args))

Customizable with different families and regularizers¶

The Dask-GLM project is nicely modular, allowing for different GLM families and regularizers, including a relatively straightforward interface for implementing custom ones.

[6]:

import dask_glm.families
import dask_glm.regularizers

family = dask_glm.families.Poisson()
regularizer = dask_glm.regularizers.ElasticNet()

b = dask_glm.algorithms.proximal_grad(
    X, y,
    max_iter=5,
    family=family,
    regularizer=regularizer,
)

/usr/share/miniconda3/envs/dask-examples/lib/python3.9/site-packages/dask/core.py:119: RuntimeWarning: overflow encountered in exp
  return func(*(_execute_task(a, cache) for a in args))
/usr/share/miniconda3/envs/dask-examples/lib/python3.9/site-packages/dask/core.py:119: RuntimeWarning: overflow encountered in exp
  return func(*(_execute_task(a, cache) for a in args))

[7]:

dask_glm.families.Poisson??

[8]:

dask_glm.regularizers.ElasticNet??

Automate Machine Learning with TPOT

Singular Value Decomposition

Dask Examples documentation