{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DataFrames: Read and Write Data\n", " \n", "Dask Dataframes can read and store data in many of the same formats as Pandas dataframes. In this example we read and write data with the popular CSV and Parquet formats, and discuss best practices when using these formats." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:22.990151Z", "iopub.status.busy": "2022-07-27T19:18:22.989766Z", "iopub.status.idle": "2022-07-27T19:18:23.049428Z", "shell.execute_reply": "2022-07-27T19:18:23.048686Z" } }, "outputs": [ { "data": { "image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEABALDBoYFhwaGBoeHRsfIC0gISIhIzItLSYqLjAxMS0nLS81PVBCNzhLOS0tRWFFS1NWW1xbMkFlbWRYbFBZW1cBERISGRYZLxsbL1c9OT9XV1dXV1dfV1dXV1dXV1dXV1dXV1dXV1dXV1dXXVdXV1dXV1dXV1dXV1dXV1dXV1dXXf/AABEIAWgB4AMBIgACEQEDEQH/xAAbAAEAAgMBAQAAAAAAAAAAAAAAAwQBAgUHBv/EAE0QAAEDAQQEBw4FAQYFBAMAAAEAAhEDBBIhMQVBUZETUmFxkrHRBhQVFiIyMzRTcnOBobIjQsHh8NIkNWJjosIHQ4KT8VR0s+IXRKP/xAAaAQEAAwEBAQAAAAAAAAAAAAAAAQMEAgUG/8QANBEBAAEDAQUGBAQHAQAAAAAAAAECAxEEEhMhMVEVMkFSYaEUM4GRBSJx0SM0QmKxwfBy/9oADAMBAAIRAxEAPwDz9ERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBF0vAlTj0957E8C1OPT3nsQc1F0vAlTj0957E8CVOPT3nsQc1F0vAlTj0957E8CVOPT3nsQc1F0vAlTj0957E8C1OPT3nsQc1F0vAtTj0957E8C1OPT3nsQc1F0vAtTj0957E8C1OPT3nsQc1F0vAtTj0957E8C1OPT3nsQc1F0vAtTj0957E8C1OPT3nsQc1F0vAtTj0957E8C1OPT3nsQc1F0vAtTj0957E8C1OPT3nsQc1F0vAtTj0957E8C1OPT3nsQc1F0vAtTj0957E8C1OPT3nsQc1F0vAtTj0957EGhKpmHMMZxe7EHNRdN2g6ozLR8nf0rQ6IfE36e89iDnouiNDVCCQ5kASfOw+ieB6md5mcfmz3IOci6Hgl/Hp7z2LHgl/Hp7z2IKCK/4Jfx6e89ieCX8envPYgoIr/gl/Hp7z2J4Jfx6e89iCgiv+CX8envPYngl/Hp7z2IKCK/4Jfx6e89ieCX8envPYgoIr/gl/Hp7z2J4Jfx6e89iCgiv+CX8envPYngl/Hp7z2IKCK/4Jfx6e89ieCX8envPYgoIr/gl/Hp7z2J4Jfx6e89iCgiv+CX8envPYngl/Hp7z2IKCK/4Jfx6e89ieCX8envPYgoIr/gl/Hp7z2J4Jfx6e89iCgiv+CX8envPYtm6GqkEgtIGcXuxBzkXSdoSsM4Hyd2LTwS/js3nsQUEXRboeoZIcwgZ54fRY8EVIm8yMtfYg+37ltGiq+pVqMDmMF1ocJBceQ5wOtfQ1NH0wG3bJQdgZ8hoxGWeU/NZ0ZUizUAPZN+0K0apQU22BkOmx0Afyi6wzj2LDbE3XYaGrUzkk80yrPDPv/4ebkGudsrcufJgCLuE7Y59qDVmjbMQJs1EGMRcbgdmSydGWaMLPRn4bexauq1sIpMBnGXgxlGvnWzalUkzTaBGHlg4zz86D5zun0W1jKdZtNtMeZUawYCcj+m5fO1GtjyTK9Jl2ODY5+fl5lgE8RuXGCDzNF6k1rYEgA86zdZyb0Q8sRep3WciXWcm9B5Yi9Tus5N6XWcm9B5Yi9Tus5N6XWciDyxF6ndZyb0us5N6DyxF6ndZyb0us5EHliL1O6zkS6zk3oPLEXqd1nJvS6zk3oPLFvTqOb5pI5l6hdZyb1E+Q9sBpZrxxRLzYWioMnOHMVGCQZEg7QvSS5194LWhgAuuBxJ14KChWqF5D2gNx1f4jEbZbBSeBEZiZeeJjEYxmvQ7PXqGpVD2hrAQKZH5hGJPVCj0nUmzWifZvjo/+UHzXclottZ9SrVYHMYLrQ4AguOuDnA619Q/RtENbdsdFx1zTY1RaNqvFmswaARwTb0mI8kQrxqoOeLC05WCz5ZFrRrHJslb+D2XmjvKzxPlG6zKTkOaCrIrPvx+Xm5Ns7Z1Le8+TAEXcJ2xz7UGPBVl/wDTUf8Att7Fg6KssYWajPw29iw6rWwik0GTMvBgYRr59y2ZUqk402gRxwcZ5+dB8x3VaIYxtKu2m2mPMqtpiAJydgNuG5fN16bAJYZOxem3n8Vu/njXzKSljN5rRsxQeTxyJHIvXLrORLrORQPI45EjkXrl1nIl1nIg8jjkSOReuXWciXWciDyOORI5F65dZyJdZyIPI45EjkXrl1nIl1nIg8jjkSOReuXWciXWciDyOORI5F65dZyJdZyIPI45EjkXrl1nIl1nIg8jg7FJTr1Gea4jmXrF1nIoniHtgNLNeOKkeXi2VRk9w5lE2o4GQSDtGa9RLnX3gsaGAC64HEnXgoKFaoXuD2gNxjDlw58MUmMJjjGXmoqO5cUvuiMY2L0qz16hqVRUaGsBApkfmEYk9Si0q+bLaJ9m+OiiFnRNJpstCQPRM+0K4KDOKFW0R6pQ+Ez7QroQVXUmycFFVdTZ5wA/n7qw/MqparM55kHi4HDIyTOaifR1TETPFBUt1Ngl7HY3nC60mGtgknmkJU0jZ2lwzLXimYGbiCcNo8l2PIQpq2jqdQND72AI8lxbIdEjDVgMORaN0RQbeutLb3nFpILsS6TtxJ3pHLiicZ4NqNpo1HljRLhi4R5uWeyZw5ipbRTAY4gYwtLLo6lRM023Tdu5nEYQDtiMNmO1TV2kscBnCVcinnDjVrUGEXnETO3VmnfbJi+J59mamqWIu86mSoxosey5Vl4+rXw9G9GqHQWukE5groWh9OmJfgOZUqNkc2A1hAmfriulUptdF4AwZE7dqttZ45U3ccMKzrVZwJLmhZNooAwXsnKJ+XWQt+8qXs2xERGEbI+Sd40oI4MQc+WSCfqFcpScE3YFFUqUmuuuEYAyRhjyqwo61mY/z2h3P8+0oIW2igTAczVGOczEbirHBN2KHvCj7NvbjOO3FWUFV1WkHlhgECTOAjnWvfNn47MpmcDnr+Slq2Sm8y5smQdxB/2jctTYKPs2oNqRpvBLC1wBgwsVzTptvOECQN5gKVlNrZugCcT/AD5rLmg5iYIPzGSCobZZ7t680jDLPHAYfzXsUlGpSqTcLSRmNY5wt3WWmTJYDOe8md5O8rNGzMYSWNDSc4QKjWNaXOAgCSoRaKOstaciDgRiR1g7lZc0OBBEgiCDrCgdYaREFgOM65nbPzQKVWk911paSBPyVunRaRkFBRszGea2MInXq1/IK1Sy+aDXgW8ULnaaDW0KoAzpP1f4SuoVydP+iqfBf9pQb6OaBZLOfIE0WYu90cqvPpNDCQwOIbMDWYyVLRXqtnnVSZ9oUtV1QvDWPuyJxEoTPBvScC+66iW8urVrWLQ4MdAo3hdnDbsVUV6hDiLQ2GmD5JwKyarxE2lgna0/XYpjE8pRMzHCYXqDGubJp3TjgVWNcDOzvynATydaiFZ8T3zTjLEbcVk1XzBtNOdkJhG16LtWmAyW05OHk86joOa8gGiW4YkjAHYoQKxugV2eUJb5OY2qKrXrMcWl8kawEwmJyt2ghhjgS4RMtE/JaCoJaOAdjGMYDnVTvypxz9E78qcc/RQl0bQwMALaYdjiBnCjs72vMGi5uBOIVPvypxzuCd+VOOdwQXK5ax4HAlwjNo1zEKIWloEuoOGMZKDvypxzuCd+VOOdwQdAMaad8U8bs3SMeZQCuLwabO4SYmN3Uq3flTjncE78qcc7gg6ve7OKE73ZxQuV35U453BO/KnHO4IOr3uzihO92cULld+VOOdwTvypxzuCDq97s4oTvdnFC5XflTjncE78qcc7gg6ve7OKE73ZxQuV35U453BO/KnHO4IOr3uzihO92cULld+VOOdwTvypxzuCDq97s4oTgGcULld+VOOdwTvypxzuCDq8AzihU9M0Wix2gho9C/7Sq3flTjncFW0laqhs1YF2BpOBy4pQdbRbYs1EbKbR9ArgVXR/oKXw29SshBVtTnAOLBLtQ+aqMtFe+L1LyTGWozjjzH6coXTNOU4LlQRItm3Dk8GDBgjPYtjTG1BGsKUUxtWjrgEl4A5SNWaDCLJLAJL2gZZjn6lhz6YzqNHOQgIpBSG1Z4LlQRIpeC5U4LlQRIpeC5VoLt4tvC8MxOI5wg1RblrYJvCBmZwEZo1rSJDpByIQaIpCwDMpwY2oI0UvBcqcFyoIkUvBcqcFyoIkUvBcqcFyoIlLSy+acFyrZrYQYK5Wm2yxw20nj6FdUrm6XyPw3dRQU2Y2GzNkC8ynnHFG1dRvp2+6VW0Sf7JQ+Ez7QrDT+O33Sg3pPoOc6mLpcMS0jYeXlW9WjSJAexhJyBAWtKxsY8vawBxmTeOsycOdTOZJBLRIyM/sggq0aDAL1OmAXcURO3L6rLqFENvmmyInzBMZ7FJVoh4Ae0EAzE/yVtc8m6QCIjEk4INaBY4AtaBdwAuwRycmBVN9FlWq6HOB5WGMMDBOauUqd2QMZxxPy2IylBkDHH8x14lBTdo9ocGmpBMwNZjOFrXsTKbS59UNaMycAr7mS4OLWlzZg6xOcYLFeiKjbtRjXNOo4jqQVDo5sTfwiZha0rC183XnAwfJiD81fIwggREZ6tyMZdEAb3E/UoOfUsTGmHVCMJ83Ac51LFKyU3khtYOI1D+cqv1KIfi5rTHL+y1pWZjDebTY0xEjZsyQVjowD8/0QaNByfPyV6TsG/8AZGggQAABsP7IKXgscc7k8Fjjncr0nYN/7JJ2Df8Asgo+CxxzuTwWOOdyvSdg3/sknYN/7IKPgscc7k8FjjncrsnYN/7LMnYN/wCyCj4LHHO5PBY453K9J2Df+ySdg3oKPgscc7lh2jQBJfgORX5Owb1hwJBBAg4ZoOU2jRMfjDyspwyW/elOJ4UESBhjicslbNiYTJptyj5RELLbIwCAxuYOZxIxxQUbZZ6VBl+rVutmPNnHZAVXStBosb6jH3mupugxqLTiuxbLIyuy5VYHNmfOIx5wudpxrW2N9NoADGEAA5C6YQX9H+gpfDb1K0FV0f6Cl8NvUrQQZWFlEHzTtIaNvGawzMjysyRjlnh9SsjSmjQ27wwiQc3ZgQNWxfAWj0j/AHj1qNe9H4XamOc+zLN6XoXhLRntRnObuXk5Vu7TGjiwUzWF0Yx5WyNi86ROy7XWfY38vRXaX0aRdNYRM/mziJy2KN+kdFuMmqJDQ3N2QiNXIvPkTsu11n2N/L05vdNYQABXbAEZO7FnxnsXt27j2LzBE7LtdZ9jfy9P8Z7F7du49ieNFi9u3cexeYInZdrrPsb+Xrlh0hStDS6i8PaDdJE556+dc3SmkrCKrmV6kVG4OHlaxhkNjlS7gfVavxf9rV8z3V/3jaOdv2NWK1pKK79VqZnELJuTFMVPq2aX0YJir5wLT5+IdnqR2l9GEzwgHMHD9F56i3dl2us+37K9/L0AaU0WP+b9/LycpW9DTWjqdS+ysR5N2IdHPkvPETsu11n2/Y38vTfGqxe2HRd2J41WL2w6LuxeZInZdrrPt+xv5em+NVi9sOi7sTxqsXth0Xdi8yROy7XWfb9jfy9N8arF7YdF3YrVg0zZ7S4to1A9zRJEEYfMLyhfV/8AD/1ir8P/AHBUaj8Pt2rc1xM8HVF2apw+7CysDJZXjtDQrm6XyPw3dRXSK5ul8j8N3UUENirFlioEC8eDYIn/AAq7T9M2cDcK51H1Cz8lOn9oXSb6dvulBdRFo54GaDdYWC6Pp9Vhrwf5/NiCGtYmPcXEukxkdiiOi2cep0uftKkqWoNddxJwwA2kD9Vsa4F6SfJEnD+Sg0ZYGtcHBz8OXPnWvg1vHfMRmOxbU7axwJBOEatq1db6YnyjOOEbMFOJlEzENho9g1uyIz2gg9f0CMsDQ68HP5pwyhbVLS1pguxw1bcljvoQXSS0a45J7N6hMcWng1mHlP5cQJ58FoNFNxl9Qzhnq/Vb+EKcTeMRjhzYfULNS2taYM/KNsfoidmWX6PYXF0uBJkwc1l1ibdDbzwBsPLK1o21r5i9gJy7FJw4wxOOOIjlQmJhhtkEzedM6jhmTl80FibLDLvJAAxzAMiVg2xkxe+nLHWt+FwJJiBJ5Bt+iGGgsTR+Z+/lmedZFjbES7nnHIiZ5itBbmGYcTAk4HJZ78Zcc+8S1uBga+REB0ewggueZBGLtR1fRbmyiZvOGOoxrlRi3s4xzjLXMfotqdra7zST8ufsQT02XRC3VR9sa2ZvCDBwyMT+qwLew5OJ+XKB1lE7MriKk3SFMkAPxJjLWpRWxAMgnbGwn9ChMTHNYRRydqSdqIbrkd0I/s1X3XfaV1Wlcrui9Wq+677Sgu6P9BS+G3qVoKro/wBBS+G3qVoIMoiIPHLR6R/vHrUaktHpH+8etRr7GnlDz55iIilAiIgIiICIiD73uB9Vq/F/2tXzPdX/AHjaOdv2NX03cD6rV+L/ALWr5nur/vG0c7fsavJ0/wDOV/8AdF9fy4chEResoEREBERAREQF9X/w/wDWKvw/9wXyi+q/4f8ArFX4f+4LLrfkVLLXeh94MllYGSyvl21oVzNL5H4buorplc3S+R+G7qKCLR9AVLJZpnCmw4RxRtV1vp2+6VU0Y9rbHQLiAOCZifdCtMH47fdKC8sLK0dUAOKDaEha8IP5zwjXgoK9QVC7yS0Nw1ScxP0lTKk7STml02eoYc4C60mQJxxAzgbwsm3vugig/wA8tIIMwMnDDXywguJCoeE3keTZqskwAWxszOrM7lI7SBBjveqTyNwzIz+X1QWmtAyAHMkKq+3uuhzbPUPllpBEHDXhOCDSBw/Aqyf8OG9BbhYACqt0iTH9nq6gfJykjbzq8zEAxmEGiKSAkBBGsEDZngVLASAgjWC0TMCdqlgJAQRpCkgJAQRopICQEEUBIGetSwEgII0UkBICDRma5fdF6tV9132ldeFyO6L1ar7rvtKC7o/0FL4bepWgquj/AEFL4bepWggyiLCDyGvQfwj/ACHecfynatO938R/RK9VoUBg1zHCBmXHm3qfvVmw9I9q9ePxWYju+6jcR1eR97v4j+iU73fxH9Er1zvVmw9I9qd6s2HpHtU9qz5fdG4jq8j73fxH9Ep3u/iP6JXrnerNh6R7U71ZsPSPanas+X3NxHV5H3vU4j+iU73qcR/RK9c71ZsPSPao61mbEgOJByDj9cU7Vny+5uI6vJ+96nEf0Sne7+I/olepNdTjy2uYccy7UJJU1OnSd5pnmce1O1Z8vubj1cDuDYW2WqHAj8XWI/K1c3T3c/Xr2ytVp3S1zhE3tTWg5N5F9qLMwYweke1bUwRMxnIjYsUauum7N2nnKzdxNOzLzc9ytqETcxMfm/pW3inatjP9X9K9HqAmIjPGdi2VvaV/0RuaXmje5W1HIMzI/NqwP5UPcragQPIxy87n4q9JpAgYxMnLnWHtMtiIBxnm1J2lf9Psbml5z4p2rYz/AFf0rDe5W1ESAz/V/SvSlpSaQPKiZOSdpX/T7G5pebnuVtUgeRJ97+lbeKdq2M/1f0r0V7XXgQRGuf0UjssM07Sv+n2NzS80b3K2oiQGQfe/pXd7ldFVbJXe6tEOZAuhxyIn8q+rpAgeVGeEbFhzXXwQRd14Y65j6blxc1125TNFWMSmLVMTmG7TgFssLKxLGhXN0vkfhu6iukVzdL5H4buooKlP1CzmCYp08vdXTb6dvulc2iYsNnP+Cn9q6TfTt90oLqwsogxCQsogwirWi0VWvAZRvtP5g6NupQd82nD+zgbRfGCDoIqQtNoP/wCuBz1B9cFJZalUucKjA1o80jXieU6oQS2h7msJY287CBMa8VV77rT6semFfWEFDv2tBJszsP8AEMcNQhXmEloJEEjEbORZWUFenVqGqWmnDAMHTnlGG/coqlqqhxDbOXCTBvACBkfmrqIIbNVc9svYWGYgkHDbgpVlEEdB7nNBc26ccCZ1rZ5gEgTAyWywg1ovLmtcWlpImDmOQrdEQEREBERAREQFx+6L1ar7rvtK7C4/dF6tV9132lBd0f6Cl8NvUrQVXR/oKXw29StBBlYWVhBCLXTx8sYEgzhlzo610wQC9okSMdRyKibZaVQNeabZcJ349ZW1SyUw3FjYa2ANgGoIJDaafHZ0gpQZxC+WbpekYd3s2Tj523PUrLe6KAAKUAYDyv2UZhlnWWY4ZfQovn/GQ+y/1fspKGn75I4OIaXedsE7EzBGsszOIl3EXz/jJ/lf6v2Txk/yv9X7JmEfG2fM75C0ZRa0ktaATnC41HuhvODeDzMed+y5dr7ueCqOZ3tN0xPCR/tXVMbXJfau03u5OX2KL4n/APIGX9l//p/9V9D3P6X79oGrwfBw8si9OQBnIbVM0THNdNMxzdVFxbXp3gqhZwcxrvfsoHd0hjCnB51xFUSx1auzTOJl3RXaTE4zHzH/AIWlW0XTET8+btXz7+6OoGk3W4AnLYJ2pZe6fhabanAxekEX9hjYuvDLqnU25omvPCH0NWuG5zr+iya3kgxnqXCd3Qg50Qec/stK3dC52VMAcuPYucw4nW2er6FlSRiIjNGVJ1EYSOUL56j3QObM02kZ4YY7Va0dpjhqopNpBhdSNS9MxBiIjlUrLWoou9yXWFYXrsFQWm3cGT5Jc1sF5BHkg5Ya1vwVWZvsn3P/ALLWvo+lUcHvYHOHPjGUjX80mOS6OPotArKwFlEtCuZpfI/Dd1FdMrmaXyPw3dRQaaOpX7HZxJH4TMvdCtt9O33Sq2i3NbZKBcQBwTMSY/KFZYPx2+6UF5ERARFhBi6NgS6NgWpWJQb3RsCXRsC0lJQVjVraqA6QQ1a0YUGz7wVlEFcVas40Rq1j+FSOc640imHOPnDKMO1SEpKCKzOqE/iU2tEajOOxaPq1Q6BQBEwDeHLjyKeedZQaNLyxxLAHflEzq7VA6rWwigJ1kuCtA/JCUEZL+DJDG35MN1Z7eZRGrWkRQAGslw+isgoSgzRm6LwAdGIGoqRaMOC3QEREBERAREQFx+6L1ar7rvtK7C4/dF6tV9132lBd0f6Cl8NvUrQVXR/oKXw29StBBlaudAnqErZYQQ2M/g0/cbq5FvWPkO909S1snoqfuN6ltW8x3unqSB8FT81vMOpbrSl5reYdS3XE83ytfekViw+c74buoqurFh853w3dRUJt9+FdERHCax+lZzr5fS3rFT3l9RY/Ss518vpb1ip7yv0/el7n4TyqVDkPmvQ/+H/qTvjO6mrz05D5r0L/AIf+pO+M7qar7vdevc5K2l/WHfzWVSV3S/p3fzWVSXn08nyV/wCZV+rSt6N/uO6iq+ifVaX/AFfcVYrejf7juoqvor1Wl/1fcVdHy5/Vpo/k6v8A1C2iIq2AV/ud9cp/+2f94VBX+531yn/7Z/3hdU8/+6PU/De9L66UlEXT2AZLKwMllBoVzdL5H4buorpFc3S+R+G7qKCpS9Rsw1mmyPk0Lpt9O33SubQeRYrOWuung2HOJF0Suk307fdKC6iIgIiIMItCsIJFgEzlHKtEQSIo0QSFFGUQb68llRfJZQbjmhZKiHMslBIEKjCFBIFlaMW6AiIgIiICIiAuP3RerVfdd9pXYXH7ovVqvuu+0oLuj/QUvht6laCq6P8AQUvht6laCDKIsIIrJ6Kn7g6lI+IM5Rio7J6Kn7jepb1fNdzFBx2O0fAjg4jDArM6P/y9xXzFLzRzLdczPF4VWsxM/kp+z6ZjbA4gAUySYAxVs6PoMBdwbQADJ5Na+W0f6en74619dbvQ1PcPUkNmmuRdomqaY4ejmzo//L+qTo//AC/qvm0UZYfjP7Kfs+ma+wAgi5OrNQu0bo2ob5pUyTjJBxXAZmOddOz+jbzKm7eqtxmlr02qmc4iI/Rc8D6N9jS3FdSwWSjRp3aDAxhN6G5Sda4h80fNd6x+iZ7oSzqKrkzEt9u7VXOJcyrabA6oQ4sLwSDgc25hTvslkaYcxgOyCvjnetVfjVupfU6Q9MfdHUrL1W75eiu5FMTP5YXKui7MGkupNuwZ5tapU26OY0NaKYaMhjrXUtvoKnuHqXxK7mccGfV3tximmmMS+knR/wDl7ik6P/y9xXzaKMsPxn9lP2fStFgJAApkkwMCs2atYWOdUp3QaYNMkA4CZLd6+esfpafvt6wsWf0dq+M/rKnOIz6x7t2mv7dM1bMRjo+3pVQ9oc0y0iQVuqeifVqXuBW109COMAyWVgZLKJaFc3S+R+G7qK6RXN0vkfhu6igxon1Sz4f8pn2hWGenb7pVKxV+DsVndE/hsETH5QrzfTt90oLqIiAiLCAiqHSVMEhxLcSMRMxInCdhWTpGmA0yYcSJjDAgY70FpFVGkqRjyjjl5J7Fl9vptcQ4kRGo6xKCyirP0hSAkk7jydoWo0nRx8rLkPL2ILZRVhbmFrnCSGxMDbzrA0jTgEkgHaOb+oILOvkWVWpW+m911rsZjI7JVlAHKhWVhACFZWEALKwsoCIiAiIgIiIC4/dF6tV9132ldhcfui9Wq+677Sgu6P8AQUvht6laCq6P9BS+G3qVoIIK9VzX0wIhxMqYiRCgtECpTkmSYGWpWEENjH4NP3G9SkqjyTzFaWT0VP3B1KR4kEbRCD4Cn5o5luu4zuZIAHCjo/us+LZ9qOj+64mOL5+rR3pmeDlaP9PT98da+utw/Bqe4epcqz9z5ZUa/hAbpBiNi7Nop32ObMXgRvUxD0NJZrt26oqjm+EWV3fFs+1HR/dPFs+1HR/dRiXnfBXvK4bMxzrpWf0beZWx3Nn2o6P7qzS0OWtDb4w5Fnv26qojENWn01yjOYUT5o+a71jH4TPdCpHRRgC+Ny6FBl1jW5wIUae1VRVM1Q32qKqZ4vgHetVfjVupfUaQ9N/0jqVV3cw7hXVOGHlPe+LvHGWa61psBe+9ejCIhXaima+76Ju0TVnCe2j8Gp7h6l8Svua9O+xzZi8CN64fi2fajo/urJhk11iu7MTRDhIu74tn2o6P7p4tn2o6P7qMSwfBXujj2P0tP329YWLP6O1fFf1ldyj3PFr2u4QeSQfN2HnWKfc65raw4UfiPc7zcr2rNJidnHrH+W/S2K6KKoqjo6eiR/ZqXuBW4UNjocHSYwmbrYnap129GOTAyWVgZLKJaFc3S+R+G7qK6RXN0vkfhu6ighsdn4Sx2dpMDgmE4f4RGtXm+nb7pVfRA/slDH/lM+0Kdnp2+6UF5ERAWFlEGtwbBuS6NgVavbbjy244hoaXOEYBxIyz1KRlrY4OIdIaJOBy1EbQYMEZoJbo2BLo2BQm207rXXsHG6MDMiZEROEHco/CVIZug68CQPmBGWPMgtFo2BY4NuwblWdpKkD52sgmCIgOM45jySMNak79p3A+9LXGBAJMiZwzwgzshBNcGwbkujYFWdpKiJl+XIeXHLLyTjlgVnwhS40bZaRGYxwwyOaCwGjYFsqrrfTDi2XGBJIaTswwGflBaN0nSJIBMDIwfKwaRdGZweMkF1FXo22m911jpMTkYyBwOWThvU6DKIiAiIgIiICIiAiIgLj90Xq1X3XfaV2Fx+6L1ar7rvtKC7o/0FL4bepWgquj/QUvht6laCCpanDhqQnGThKtrV1IFwcRiMsVl0xhnyoI7J6Kn7g6lvW8x3MepR2OeBp5eY3qW9bzHe6epIHxFC11eDZ+I/zeMdpW/flX2j+kVWoeip+7+pW6ivvS+bv11Rcq4+Kbvur7R/SKd91faP6RUKLhTt1dU3fdX2j+kU78q+0f0ioUQ26uqbvur7R/SKwLVU9o/pFRIht1dU3fdX2j+kV9F3PVHOouLiSb+s8gXy6+l7mvQu9/9Apjm3aCqZu8ZczS1pqNrvAe4DYCVT77q+0f0ip9M+sPVJQz366t5Vx8VhlrqT6R/SK177q+0f0io6fnBaoq26sc03flX2j+kU78q+0f0ioURG3V1XbBaqhrUwXuILxrO1drRNRxr1wSSA7CTlkuBo/09L3x1ru6I9PX97sXFU/mh6mhqmY49XYWVhFe9YGSysDJZQaFc3S+R+G7qK6RXN0vkfhu6igaI9Us/wAJn2hTs9O33SoNEeqWf4TPtCnZ6dvulBeREQFhZRBXq2Nj33nXsgCJMG6SRI14laiwUw1zYMOAacTkMmjkEnelevUa662leG2cOrkP02rTv18E8BUwMQg2do6mafBkG5JMSdck9ZWfB9KCIMGZxOsQfojLS4uu8E4cpy19i1p2xxn8Go3CceeIQat0XTiHS4yTJJ1l2A2ecVMbGy61vleSSQQ4g4zOPzKibbXHKi//AMFZp2t5maLxhInqQDo2ljgcQW+ccAQRA2DyijtHML3OMkOAlsmJlxk7fO+i276ddvcE+ZiNaw+1OABFJxknDXA+SDU6MpEY3jzuJnLPohG6LpNiA4REG8cIDRh8mhYbbKh/5DhtmcseTPJZFsfh+A/GEEtGx06ZBaIjLHaAOpoU6ri0uv3eCfExe1c6jNtf7B8/TJBdRUzaqgJHBEjCCDySdXyWrrc6QBRdJ24fpsx+aC8iqG1Ov3RSdEST8pj9FgWp8SaTs4EbIGOXKd3yQXEVQ2l8AimcZwM6shgMJ/RYbaal0k0jgMscZw2c/wBEFxFTp2txcAaLwCYnfycyd9vLSRRdIjA4TPyQXEVMWt+P4L8BPOcMBvO5S2auXgyxzI42tBOuP3RerVfdd9pXYXH7ovVqvuu+0oLuj/QUvht6laCq6P8AQUvht6laCDKIiCGyeip+4Opb1h5DuYrSyeip+4OpSPMAnYEHxVDRtfg2DgnyGwcOUrfwZX9k7crtLulqFoJpsxG0rfxkqezZvKirnOXiXKdNNczNU5c/wZX9k7cngyv7J25dDxjqezZvKuaP0s6sKpLWi428I159i5xCKLOmrnZiqXD8G1/ZO3J4Mr+yduXQ8ZKns2bynjJU9mzeU4ONjS+aXP8ABlf2Ttyx4Or+yfuXR8Y6ns2byseMdT2bPqnA2NL5pUPBlf2Tty7+gKD6dJwe0tN6ceYKh4x1PZs3ldTRNvdaGOc5oEOjDmUxhq0tNiLn5KpmXG0rYar67nNpuI2gKp4Mr+yduXS0z3QVLNWdTaxjgKZfJnUMlnRfdA+vYn2hzGhzX3Q0TGrtU7HDKy5obc7VyZlzmaNrg+iduWvgyv7J25dDxjqezZvKeMlT2bN5XPBi2NL5pc/wZX9k7cngyv7J25fUWa1l7GuIAkgbxKWq1Fjw0AHyS7cD2KJmIao0FqYzmXz1h0fWbWpl1NwAeCTHKuxoui9tasXNIBOBIzyXynj/AF4ngKW8rr2Hunq1bdTsxpsDXfmEyPJLv0VlVmcxlstaOLPCJ9X1KLCyjQwMllYGSyg0K5ul8j8N3UV0iubpfI/Dd1FA0R6pZ/hM+0Kdnp2+6VBoj1Sz/CZ9oU7PTt90oLyIiAsLKwgrV31g7yGAtjWda1e6vGAbMjdjOvm3ra0UqpJuOjARjlt1LV9GsRhUAN7ZqjLJBnhKwLBwYIMXjOW3Xz/RRudacYazXBnd+63ay0XTL6d7CMDGeP0QstGPls5MOtBtNYsBgB8mRqjGB1LUVK/lSxuAEY544/RZdTr3iWvbd1Ajk5tqw5logQ5l6McMJnm2R9UGalSvhdptOAmXa9Y+X6rD6lcgXWNmXAknKMjHLiUfTrwLr2zGMjDM8myFmoyveJa9gbqBHWg04S0iPw2k68Y+a2c+vqY04nCdUCNe2f5nPQDw0cIQXbRkpEFWo+uB5LGnDHGMdaxwlo9mzfkraIKrqta6CKYJOeOScJWug8G29exE4Rjr3K0iCoaleCRTbM4AnVhGvPPcteEtBI/DaBrx3a1dRBTFWuQCKYGJkHmEQZ2yJWH1q4HogeY/pP8APqrqIKb6louCKbS7HXgMMNe3qW5dWkw1sBuHKcOXnVlEFezOqkkVGtAjAg5nXgrCLKAuP3RerVfdd9pXYXH7ovVqvuu+0oLuj/QUvht6laCq6P8AQUvht6laCDK1c6BPUJWyIILG78Gnn5jdXIpKh8k8xWlk9FT9xvUt6vmnmKD4Ch5jeYKRR0PMbzKRc1d6Xy93v1fqLraD8y0fD7VyV1tB+ZaPh9qiFml+bH1/w5KIihnEREBfR9zPon+/+gXzi+k7mfRP9/8AQKY5tug+dH1fP91frT/gO6is9zf901vi/wBKd1frT/gO6k7m/wC6K3xf6Vd/RP0/29q58iv6/wC0SIiofMPqtH+jp87ftWdIelHw3dRTR/o6fO37U0h6UfDd1FV1co/WP8vpKO5H0eSDL5L6nQ/970fn/wDG5fLDL5L6nQ/970fn/wDG5elc5/SW+rn9JeiSkoiyqAZLKwMllBoVzdL5H4buorpFc3S+R+G7qKBoj1Sz/CZ9oU7PTt90qDRHqln+Ez7Qp2enb7pQXlgmEQiUCUlYuhAwBBoawBgkA7JRtYHIg/NaOszS68QL22TqMhaixMvXrgvDWgl4cTdkTsnFZ4Ucm9RizAEkDE5mTrzWosTMZE3jJk8sqUJTWG0b0NYZSN6h7wZxfqVu2zANuxhsnkhCPVvwo5N6yKoOUb/l+irnR1LiDeVOaUmf1+ah1OPBnhObesGqNo3qJ1iYQGloIGQk4ILCwZNExGZ2Qp4OeKbhdeG9OFG0b1obODntnM55qLwdT4g3lOBOfBPwoxxGGeKyKk5Qoe8mcUbzrEKVtOJ5TJ/nyUJbX0vrF0pdKDN9L6xdKXSgzfS+sXSl0oM30vrF0pdKDZrpXH7oHg2esJxDXT0Suu0Lhac9HafcP2lB1dH+gpfDb1K0FV0f6Cl8NvUrQQZWFlYQQiyUxk0LStRY0TwZdqgKMW54m9QfnAu44YweTLqWrdJyA4UapByIHNj9fog1s+jLMWx3u0XYHlMGOCl8E2b2FPohS07QXNB4N4mcCMoOvnVY6SdHoKvR14fTlRzsU9Engmzewp9EKGpZKLLzW2eARBLG5qxWtT23YpOcHZxqxAxwR1rcGtPBOJOYaJunDD6oRRTHghpWCzOJ/szWxxmBSeCrP7Cn0Qj7eQY4CqcYkN/dY8IH2NUc4Eb0RsU9GfBVn9hT6IUNfR1nEDvYHXLWDUVY78J4Mim66/Mn8oiQSFp4QxMUapAddkNzxIw3IbFPRinoyzOAPAME6iwSpqdgotwbTa0cghQnSJ9hWy4vJP7IdJQQDSqCXXRhmiYppjlDeroqzvMvo03GIlzQTGzFVe87O0FjbLDL0w1gAJ2x/MlcNpcKgaWeSTF7HYDOUZmM+pRm3nH8CrhyDHEjDFHTWlo6zOE97sHOwBb+CbN7Cn0QnfjrpcKT5DgACDJB15K4jjYp6IG2KkMmNA5FXr0aV7Gi513WBMyMvqr6Jh3hzhoGxf8ApaH/AG29ilZoizNcHtoUg8ZODQCNWauImRC6zMAJuqqLhIPAvy2c3KugiDWmZaCAQIyOpbrCyg0K5ul8j8N3UV0iubpfI/Dd1FA0R6pZ/hM+0Kdnp2+6VBoj1Sz/AAmfaFOz07fdKCvZifCFYThwbf0XTqE3Tdz1LAosDy8NF8iC6MSFuuqqs4cUUzTEqdRtU1GPaABMOxJw5B/MlZvG9EYYfr+29bouXaOEhb3Ql0INISFvdCXQg0hIW90JdCDSEW90JdCDSEhb3Ql0INISFvdCXQg0hIW90JdCDRFvdCXQg0Rb3Ql0INEW90JdCDRFvdCXQg1ZmuHpz0dp9w/aV3wFwNOejtPuH7Sg6uj/AEFL4bepWgquj/QUvht6laCDKwsrCD5CtVt143KwAnCXt7edaGrbtVfHX5be3+cq3tNOjwjr4qEhw5ssFE+nZm3XFtTHLk/kBRvoZN5Wkp1bbPlVpEZB7e1SivavaH/uN7VWqGzvlzhUJ1x8tnyVuy0mFnkFwbJgSNvMm/iETcrY4e1e0PTb2qK0PtzgBSqG9OP4jcvmVf4Icv8APkprJQDnxjkdfNyJF+JnwKblczES4XB6V9qf+4ztXZ0Oy18EeHcS68YN4HCBs+a6XeY2lS0qd0QFZVXmMYaopwpWhte6brrpwxLgNaifw9wxWGuHFwjzv/AXRtEXDIJGGA5wqDHUu98GODLuLZxi9l88dy4dLWjBUufiPvnaDI17FNai4DyXBpxxJwyMStNHFppi4CG6gTOtTV6AqCHZYg/MR+qEqN20xJq0x8jEa5SLSbsPZF3Fw2/rq+q3doeic2nKBiclZoWVtMEMwBMoK1MWgOF4sLYxg5/Tm3cuFsH+StrqXUGs86Tzra6l1BrPOk862updQRVSbjoN03TBJywzUTC6GzUH5ciMRr3/AKKy+kHAtOREH5rRtlaABsII+WSCVuQWyw3ILKDQrm6XyPw3dRXSK5ul8j8N3UUDRHqln+Ez7Qp2enb7pUGiPVLP8Jn2hTs9O33SgvIiICwsog5lvrVWVPIvFoaKhutmQybzBymWxrzUXfFpEA6nNBNwnCaZcT0nDkurrog5lO2Vn0aji248AEeSTG0Ea45Fl1qqupVoaQ5rCWw0gh2Pk45kQMRtXSRByn169EtBF+88YQXQ2Wgi8AMcXHJYdpGs0NvMF5zZuhjsfIc7DmiIzXWQtBiRllyIOTV0hWJcGU4EOuksdji8AjmutPLeGS6lF0saTmQDlH0WyIMoiICLCygIiICIsIMosLKAiIgIiIC4OnaZFG0OORa6OiV3lx+6H1Wr7rvtKC7o/wBBS+G3qVoKro/0FL4bepWggysLKIPh6tPSF83T5M8ZvatAzSMHGcMIc3tWlp0zZzVN6lVLpjBwgxyKJumbMA6KdYSCPO5f5mo/ieWFW+joscHpHPXHGb2qUMt2u9PvN7VS8MWa7HB1RhneGP8AP5rVyl3Q0gABTqRGEkJ/F8sE34jwhtctv+LpM7V0NCNtAr/izcuHMtOMiMjzqh4x0/Zv3hd7Q1UVqYrNBAMiDyGP0TNz+qmIKbsVTiIh0URFK1pVm75Jg4YnnVUcNwWNRl+POkRM55cwVupTDhBy7MVSFCmAaABgADPU45cn7ILljvXPLcHO1kZZqwqmjrvBi4HBskAOzwJH6K2gIiICIiAiIgIiICIiDAyWVgZLKDQrm6XyPw3dRXSK5ml8j8N3UUGdEeqWf4TPtCnZ6dvulQaI9Us/wmfaFOz07fdKC8iIgLCysIK1aylziRUc2dhUYsLtVd+Gr5ZKWvYmVHXjM5YGORaeDqd275UXr2eZgDHbgEGX2VxAAqvbDYw17CeVamxyCBVcJMkg8hBH1lbeD6cEeVBIOBjLl+aU7Axrg5sgjl5IQGWRwIJqvMHKcD/P5qWadkcM6rzn9VZRBRFgcP8Anv7MZOtSusd68C9xDnTGzGYCsogpmxPxis/PDHIbM1vSsrmkk1XOkEY6p2blZRBU7yOH4r5Agmc8Sf1Q2J2H4z4GJE5q2iCm+wEzFV4BJJE/RSGynyvxHQ4zgcsScN6sIgqCxOkfjVMDOf0P8/WdXaPJcXcK+ZMY5TqV1EFMWF2M1nmdU5LdtkIcTwj4JJiciZ6pVlEFQ2J3tn8mJ7Vg2F/tn/L6q4iCqbG6I4V+YIM8kRnrVpogAZ8qIgyuP3RerVfdd9pXXXI7ovVqvuu+0oLuj/QUvht6laC8kp93Nua0Na6mABA8jUFv4+2/jU+gg9YReT+Ptv41PoJ4+2/jU+gg+rq6PtJebtCldnMsZK0Gj7XBmz0jhhDGYr5B/dja3G8SwnmPVMLVvdfaxkWY+92rjZnrLN8PHWX2Pg+1x6vRnZcYpmaPrRjQZOvyKfYviR3YWuI/D3HtUze7u3AAB1OB/g/dNiesk6aJ8ZfZeDqvsWdCn2Ls6NpFlENc0NMnAADXsGC818fLfxqfQTx8t/Gp9BTFOPF1bsRROczL1VF5V4+W/jU+gnj5b+NT6C6XvVVB5d84i75MZbcV5j4+W/jU+gtD3bW28XTTkxJubIjXyBB6rZL938QtLpM3cszH0hTryWn3c25ohppgYnzNpkrbx9t/Gp9BB6wi8n8fbfxqfQTx9t/Gp9BB6wi8n8fbfxqfQTx9t/Gp9BB6wi8n8fbfxqfQTx9t/Gp9BB6wi8n8fbfxqfQTx9t/Gp9BB6wi8n8fbfxqfQTx9t/Gp9BB6wMlleTePtv41PoJ4/W/jU+gg9WKpaUDeDfMXuDdGPIZwXm3j7b+NT6C0qd3Fue0tcaZBEHyNRQej6I9Us/wmfaFOz07fdK8ro92NuYxrG1GhrQGgXG4AYDUj+7G3OxNRp/6Ag9hlYc6BK8d8bbZx29AJ422zjt6AQewh4Wb2MLx3xttnHb0AnjbbOO3oBB61UtQa67DicMuUgfqhtTbwbJxiNhleS+Nts47egE8bbZx29AIPWXWxoddkzMZLXv5mIkyDGRXlHjbbOO3oBPG22cdvQCngji9abaJE4wBJ5M+xaC3sPG2nDkn9F5R422zjt6ATxttnHb0AodRjxervtzWifKIzw+Wr5pStocYAdlOXLC8o8bbZx29AJ422zjt6AROY6PWKdsa4E4iBOPz7CnfzMcTgJOHN2ryfxttnHb0AnjbbOO3oBEcHq4tzDkXH5coH6qbhMDOEYka4/gXkXjbbOO3oBPG22cdvQCE48Hq4t7DEXiTlgdkobc3/GTjgGk5LyjxttnHb0AnjbbOO3oBEPWnWoB10yDzcsBattjSJF6ObkleT+Nts47egE8bbZx29AIng9ZZaw4EtvENzwjl1qeeVeP+Nts47egE8bbZx29AIS9gnlSeVeP+Nts47egE8bbZx29AIh7BPKk8q8f8bbZx29AJ422zjt6AQewtOK5PdA7+zVhB812O3ySvNPG22cdvQC1qd1Vrc0tc9pBBBF0ZFBxUREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQf/9k=\n", "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import YouTubeVideo\n", "\n", "YouTubeVideo(\"0eEsIA0O1iE\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Start Dask Client for Dashboard\n", "\n", "Starting the Dask Client is optional. It will provide a dashboard which \n", "is useful to gain insight on the computation. \n", "\n", "The link to the dashboard will become visible when you create the client below. We recommend having it open on one side of your screen while using your notebook on the other side. This can take some effort to arrange your windows, but seeing them both at the same is very useful when learning." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:23.053158Z", "iopub.status.busy": "2022-07-27T19:18:23.052942Z", "iopub.status.idle": "2022-07-27T19:18:25.079274Z", "shell.execute_reply": "2022-07-27T19:18:25.078741Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "
\n", "
\n", "

Client

\n", "

Client-e20d2897-0de0-11ed-a12a-000d3a8f7959

\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
Connection method: Cluster objectCluster type: distributed.LocalCluster
\n", " Dashboard: http://127.0.0.1:8787/status\n", "
\n", "\n", " \n", "
\n", "

Cluster Info

\n", "
\n", "
\n", "
\n", "
\n", "

LocalCluster

\n", "

6de9824d

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", "
\n", " Dashboard: http://127.0.0.1:8787/status\n", " \n", " Workers: 1\n", "
\n", " Total threads: 4\n", " \n", " Total memory: 1.86 GiB\n", "
Status: runningUsing processes: True
\n", "\n", "
\n", " \n", "

Scheduler Info

\n", "
\n", "\n", "
\n", "
\n", "
\n", "
\n", "

Scheduler

\n", "

Scheduler-f935f86a-49c7-4bf7-a1cf-50e7d7648a8e

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", " Comm: tcp://127.0.0.1:46871\n", " \n", " Workers: 1\n", "
\n", " Dashboard: http://127.0.0.1:8787/status\n", " \n", " Total threads: 4\n", "
\n", " Started: Just now\n", " \n", " Total memory: 1.86 GiB\n", "
\n", "
\n", "
\n", "\n", "
\n", " \n", "

Workers

\n", "
\n", "\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: 0

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", "\n", "
\n", " Comm: tcp://127.0.0.1:42199\n", " \n", " Total threads: 4\n", "
\n", " Dashboard: http://127.0.0.1:44125/status\n", " \n", " Memory: 1.86 GiB\n", "
\n", " Nanny: tcp://127.0.0.1:45137\n", "
\n", " Local directory: /home/runner/work/dask-examples/dask-examples/dataframes/dask-worker-space/worker-n5409ak8\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "
\n", "\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from dask.distributed import Client\n", "client = Client(n_workers=1, threads_per_worker=4, processes=True, memory_limit='2GB')\n", "client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create artificial dataset\n", "\n", "First we create an artificial dataset and write it to many CSV files.\n", "\n", "You don't need to understand this section, we're just creating a dataset for the rest of the notebook." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:25.082928Z", "iopub.status.busy": "2022-07-27T19:18:25.082450Z", "iopub.status.idle": "2022-07-27T19:18:25.438226Z", "shell.execute_reply": "2022-07-27T19:18:25.437678Z" } }, "outputs": [ { "data": { "text/html": [ "
Dask DataFrame Structure:
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnamexy
npartitions=30
2000-01-01int64objectfloat64float64
2000-01-02............
...............
2000-01-30............
2000-01-31............
\n", "
\n", "
Dask Name: make-timeseries, 30 tasks
" ], "text/plain": [ "Dask DataFrame Structure:\n", " id name x y\n", "npartitions=30 \n", "2000-01-01 int64 object float64 float64\n", "2000-01-02 ... ... ... ...\n", "... ... ... ... ...\n", "2000-01-30 ... ... ... ...\n", "2000-01-31 ... ... ... ...\n", "Dask Name: make-timeseries, 30 tasks" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import dask\n", "df = dask.datasets.timeseries()\n", "df" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:25.440969Z", "iopub.status.busy": "2022-07-27T19:18:25.440774Z", "iopub.status.idle": "2022-07-27T19:18:43.558446Z", "shell.execute_reply": "2022-07-27T19:18:43.557759Z" } }, "outputs": [], "source": [ "import os\n", "import datetime\n", "\n", "if not os.path.exists('data'):\n", " os.mkdir('data')\n", "\n", "def name(i):\n", " \"\"\" Provide date for filename given index\n", " \n", " Examples\n", " --------\n", " >>> name(0)\n", " '2000-01-01'\n", " >>> name(10)\n", " '2000-01-11'\n", " \"\"\"\n", " return str(datetime.date(2000, 1, 1) + i * datetime.timedelta(days=1))\n", "\n", "df.to_csv('data/*.csv', name_function=name);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read CSV files\n", "\n", "We now have many CSV files in our data directory, one for each day in the month of January 2000. Each CSV file holds timeseries data for that day. We can read all of them as one logical dataframe using the `dd.read_csv` function with a glob string." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:43.561930Z", "iopub.status.busy": "2022-07-27T19:18:43.561608Z", "iopub.status.idle": "2022-07-27T19:18:43.730051Z", "shell.execute_reply": "2022-07-27T19:18:43.728716Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data/2000-01-01.csv\r\n", "data/2000-01-02.csv\r\n", "data/2000-01-03.csv\r\n", "data/2000-01-04.csv\r\n", "data/2000-01-05.csv\r\n", "data/2000-01-06.csv\r\n", "data/2000-01-07.csv\r\n", "data/2000-01-08.csv\r\n", "data/2000-01-09.csv\r\n", "data/2000-01-10.csv\r\n" ] } ], "source": [ "!ls data/*.csv | head" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:43.735804Z", "iopub.status.busy": "2022-07-27T19:18:43.735314Z", "iopub.status.idle": "2022-07-27T19:18:43.897619Z", "shell.execute_reply": "2022-07-27T19:18:43.896656Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "timestamp,id,name,x,y\r\n", "2000-01-01 00:00:00,1009,Jerry,0.9005427499558429,0.3212344670325944\r\n", "2000-01-01 00:00:01,940,Quinn,0.46795036754868247,-0.01884571513893385\r\n", "2000-01-01 00:00:02,1017,Ingrid,0.9442706585905265,-0.9229268785155369\r\n", "2000-01-01 00:00:03,1034,Tim,0.010273653581192255,-0.2850042344432575\r\n", "2000-01-01 00:00:04,963,Bob,-0.9556052127604173,-0.409805293606079\r\n", "2000-01-01 00:00:05,992,Ray,0.49090905386189876,-0.8364030355424359\r\n", "2000-01-01 00:00:06,999,Ray,-0.1791414361782142,0.9108295350480047\r\n", "2000-01-01 00:00:07,1017,Tim,-0.6121437272121055,0.5585754365941122\r\n", "2000-01-01 00:00:08,1037,Dan,-0.6931099564135064,-0.6357258139372404\r\n" ] } ], "source": [ "!head data/2000-01-01.csv" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:43.901346Z", "iopub.status.busy": "2022-07-27T19:18:43.901011Z", "iopub.status.idle": "2022-07-27T19:18:44.063148Z", "shell.execute_reply": "2022-07-27T19:18:44.062416Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "timestamp,id,name,x,y\r\n", "2000-01-30 00:00:00,1067,Quinn,-0.9275010814781244,0.7051035850972305\r\n", "2000-01-30 00:00:01,1011,Quinn,-0.8288674460103511,-0.3018417020358921\r\n", "2000-01-30 00:00:02,933,Laura,-0.5165326137868189,0.9195088929096915\r\n", "2000-01-30 00:00:03,1040,Ray,0.8073954879070395,0.9243639047927026\r\n", "2000-01-30 00:00:04,963,Wendy,0.791167365074305,0.2941664104084778\r\n", "2000-01-30 00:00:05,1008,Bob,0.38959445411393334,-0.32793662786416844\r\n", "2000-01-30 00:00:06,1008,Ray,-0.2127878456673038,0.040117377007003796\r\n", "2000-01-30 00:00:07,1038,Ingrid,0.3092567914432629,0.11665005655447458\r\n", "2000-01-30 00:00:08,985,Hannah,-0.42749597352375934,-0.3888014211219375\r\n" ] } ], "source": [ "!head data/2000-01-30.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can read one file with `pandas.read_csv` or many files with `dask.dataframe.read_csv`" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:44.067309Z", "iopub.status.busy": "2022-07-27T19:18:44.066803Z", "iopub.status.idle": "2022-07-27T19:18:44.199821Z", "shell.execute_reply": "2022-07-27T19:18:44.199111Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampidnamexy
02000-01-01 00:00:001009Jerry0.9005430.321234
12000-01-01 00:00:01940Quinn0.467950-0.018846
22000-01-01 00:00:021017Ingrid0.944271-0.922927
32000-01-01 00:00:031034Tim0.010274-0.285004
42000-01-01 00:00:04963Bob-0.955605-0.409805
\n", "
" ], "text/plain": [ " timestamp id name x y\n", "0 2000-01-01 00:00:00 1009 Jerry 0.900543 0.321234\n", "1 2000-01-01 00:00:01 940 Quinn 0.467950 -0.018846\n", "2 2000-01-01 00:00:02 1017 Ingrid 0.944271 -0.922927\n", "3 2000-01-01 00:00:03 1034 Tim 0.010274 -0.285004\n", "4 2000-01-01 00:00:04 963 Bob -0.955605 -0.409805" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "df = pd.read_csv('data/2000-01-01.csv')\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:44.203481Z", "iopub.status.busy": "2022-07-27T19:18:44.203226Z", "iopub.status.idle": "2022-07-27T19:18:44.234548Z", "shell.execute_reply": "2022-07-27T19:18:44.233935Z" } }, "outputs": [ { "data": { "text/html": [ "
Dask DataFrame Structure:
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampidnamexy
npartitions=30
objectint64objectfloat64float64
...............
..................
...............
...............
\n", "
\n", "
Dask Name: read-csv, 30 tasks
" ], "text/plain": [ "Dask DataFrame Structure:\n", " timestamp id name x y\n", "npartitions=30 \n", " object int64 object float64 float64\n", " ... ... ... ... ...\n", "... ... ... ... ... ...\n", " ... ... ... ... ...\n", " ... ... ... ... ...\n", "Dask Name: read-csv, 30 tasks" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import dask.dataframe as dd\n", "\n", "df = dd.read_csv('data/2000-*-*.csv')\n", "df" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:44.237646Z", "iopub.status.busy": "2022-07-27T19:18:44.237122Z", "iopub.status.idle": "2022-07-27T19:18:44.386331Z", "shell.execute_reply": "2022-07-27T19:18:44.385694Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampidnamexy
02000-01-01 00:00:001009Jerry0.9005430.321234
12000-01-01 00:00:01940Quinn0.467950-0.018846
22000-01-01 00:00:021017Ingrid0.944271-0.922927
32000-01-01 00:00:031034Tim0.010274-0.285004
42000-01-01 00:00:04963Bob-0.955605-0.409805
\n", "
" ], "text/plain": [ " timestamp id name x y\n", "0 2000-01-01 00:00:00 1009 Jerry 0.900543 0.321234\n", "1 2000-01-01 00:00:01 940 Quinn 0.467950 -0.018846\n", "2 2000-01-01 00:00:02 1017 Ingrid 0.944271 -0.922927\n", "3 2000-01-01 00:00:03 1034 Tim 0.010274 -0.285004\n", "4 2000-01-01 00:00:04 963 Bob -0.955605 -0.409805" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tuning read_csv\n", "\n", "The Pandas `read_csv` function has *many* options to help you parse files. The Dask version uses the Pandas function internally, and so supports many of the same options. You can use the `?` operator to see the full documentation string." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:44.389470Z", "iopub.status.busy": "2022-07-27T19:18:44.389104Z", "iopub.status.idle": "2022-07-27T19:18:44.446823Z", "shell.execute_reply": "2022-07-27T19:18:44.445853Z" } }, "outputs": [], "source": [ "pd.read_csv?" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:44.450664Z", "iopub.status.busy": "2022-07-27T19:18:44.450132Z", "iopub.status.idle": "2022-07-27T19:18:44.460840Z", "shell.execute_reply": "2022-07-27T19:18:44.460157Z" } }, "outputs": [], "source": [ "dd.read_csv?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case we use the `parse_dates` keyword to parse the timestamp column to be a datetime. This will make things more efficient in the future. Notice that the dtype of the timestamp column has changed from `object` to `datetime64[ns]`." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:44.464037Z", "iopub.status.busy": "2022-07-27T19:18:44.463587Z", "iopub.status.idle": "2022-07-27T19:18:44.490587Z", "shell.execute_reply": "2022-07-27T19:18:44.489972Z" } }, "outputs": [ { "data": { "text/html": [ "
Dask DataFrame Structure:
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampidnamexy
npartitions=30
datetime64[ns]int64objectfloat64float64
...............
..................
...............
...............
\n", "
\n", "
Dask Name: read-csv, 30 tasks
" ], "text/plain": [ "Dask DataFrame Structure:\n", " timestamp id name x y\n", "npartitions=30 \n", " datetime64[ns] int64 object float64 float64\n", " ... ... ... ... ...\n", "... ... ... ... ... ...\n", " ... ... ... ... ...\n", " ... ... ... ... ...\n", "Dask Name: read-csv, 30 tasks" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = dd.read_csv('data/2000-*-*.csv', parse_dates=['timestamp'])\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Do a simple computation\n", "\n", "Whenever we operate on our dataframe we read through all of our CSV data so that we don't fill up RAM. This is very efficient for memory use, but reading through all of the CSV files every time can be slow." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:44.493553Z", "iopub.status.busy": "2022-07-27T19:18:44.493356Z", "iopub.status.idle": "2022-07-27T19:18:47.764008Z", "shell.execute_reply": "2022-07-27T19:18:47.763446Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 211 ms, sys: 20.6 ms, total: 232 ms\n", "Wall time: 3.26 s\n" ] }, { "data": { "text/plain": [ "name\n", "Alice 0.004810\n", "Bob -0.000236\n", "Charlie -0.003038\n", "Dan 0.002005\n", "Edith -0.001287\n", "Frank 0.000691\n", "George -0.002461\n", "Hannah -0.004205\n", "Ingrid 0.001781\n", "Jerry -0.000149\n", "Kevin 0.000707\n", "Laura 0.002090\n", "Michael -0.004071\n", "Norbert -0.001131\n", "Oliver -0.002930\n", "Patricia 0.000120\n", "Quinn 0.000870\n", "Ray 0.000424\n", "Sarah -0.000817\n", "Tim 0.003061\n", "Ursula 0.002109\n", "Victor -0.001035\n", "Wendy -0.002654\n", "Xavier 0.000702\n", "Yvonne 0.000308\n", "Zelda -0.001066\n", "Name: x, dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time df.groupby('name').x.mean().compute()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Write to Parquet\n", "\n", "Instead, we'll store our data in Parquet, a format that is more efficient for computers to read and write." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:47.766982Z", "iopub.status.busy": "2022-07-27T19:18:47.766711Z", "iopub.status.idle": "2022-07-27T19:18:51.471571Z", "shell.execute_reply": "2022-07-27T19:18:51.470189Z" } }, "outputs": [], "source": [ "df.to_parquet('data/2000-01.parquet', engine='pyarrow')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:51.476762Z", "iopub.status.busy": "2022-07-27T19:18:51.476401Z", "iopub.status.idle": "2022-07-27T19:18:51.640158Z", "shell.execute_reply": "2022-07-27T19:18:51.639284Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "part.0.parquet\t part.16.parquet part.23.parquet part.4.parquet\r\n", "part.1.parquet\t part.17.parquet part.24.parquet part.5.parquet\r\n", "part.10.parquet part.18.parquet part.25.parquet part.6.parquet\r\n", "part.11.parquet part.19.parquet part.26.parquet part.7.parquet\r\n", "part.12.parquet part.2.parquet part.27.parquet part.8.parquet\r\n", "part.13.parquet part.20.parquet part.28.parquet part.9.parquet\r\n", "part.14.parquet part.21.parquet part.29.parquet\r\n", "part.15.parquet part.22.parquet part.3.parquet\r\n" ] } ], "source": [ "!ls data/2000-01.parquet/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read from Parquet" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:51.644415Z", "iopub.status.busy": "2022-07-27T19:18:51.643840Z", "iopub.status.idle": "2022-07-27T19:18:51.673786Z", "shell.execute_reply": "2022-07-27T19:18:51.673208Z" } }, "outputs": [ { "data": { "text/html": [ "
Dask DataFrame Structure:
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampidnamexy
npartitions=30
datetime64[ns]int64objectfloat64float64
...............
..................
...............
...............
\n", "
\n", "
Dask Name: read-parquet, 30 tasks
" ], "text/plain": [ "Dask DataFrame Structure:\n", " timestamp id name x y\n", "npartitions=30 \n", " datetime64[ns] int64 object float64 float64\n", " ... ... ... ... ...\n", "... ... ... ... ... ...\n", " ... ... ... ... ...\n", " ... ... ... ... ...\n", "Dask Name: read-parquet, 30 tasks" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = dd.read_parquet('data/2000-01.parquet', engine='pyarrow')\n", "df" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:51.676829Z", "iopub.status.busy": "2022-07-27T19:18:51.676389Z", "iopub.status.idle": "2022-07-27T19:18:52.631870Z", "shell.execute_reply": "2022-07-27T19:18:52.631085Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 132 ms, sys: 13.1 ms, total: 145 ms\n", "Wall time: 942 ms\n" ] }, { "data": { "text/plain": [ "name\n", "Alice 0.004810\n", "Bob -0.000236\n", "Charlie -0.003038\n", "Dan 0.002005\n", "Edith -0.001287\n", "Frank 0.000691\n", "George -0.002461\n", "Hannah -0.004205\n", "Ingrid 0.001781\n", "Jerry -0.000149\n", "Kevin 0.000707\n", "Laura 0.002090\n", "Michael -0.004071\n", "Norbert -0.001131\n", "Oliver -0.002930\n", "Patricia 0.000120\n", "Quinn 0.000870\n", "Ray 0.000424\n", "Sarah -0.000817\n", "Tim 0.003061\n", "Ursula 0.002109\n", "Victor -0.001035\n", "Wendy -0.002654\n", "Xavier 0.000702\n", "Yvonne 0.000308\n", "Zelda -0.001066\n", "Name: x, dtype: float64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time df.groupby('name').x.mean().compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Select only the columns that you plan to use\n", "\n", "Parquet is a column-store, which means that it can efficiently pull out only a few columns from your dataset. This is good because it helps to avoid unnecessary data loading." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2022-07-27T19:18:52.634889Z", "iopub.status.busy": "2022-07-27T19:18:52.634457Z", "iopub.status.idle": "2022-07-27T19:18:53.496529Z", "shell.execute_reply": "2022-07-27T19:18:53.495914Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 130 ms, sys: 6.46 ms, total: 136 ms\n", "Wall time: 851 ms\n" ] }, { "data": { "text/plain": [ "name\n", "Alice 0.004810\n", "Bob -0.000236\n", "Charlie -0.003038\n", "Dan 0.002005\n", "Edith -0.001287\n", "Frank 0.000691\n", "George -0.002461\n", "Hannah -0.004205\n", "Ingrid 0.001781\n", "Jerry -0.000149\n", "Kevin 0.000707\n", "Laura 0.002090\n", "Michael -0.004071\n", "Norbert -0.001131\n", "Oliver -0.002930\n", "Patricia 0.000120\n", "Quinn 0.000870\n", "Ray 0.000424\n", "Sarah -0.000817\n", "Tim 0.003061\n", "Ursula 0.002109\n", "Victor -0.001035\n", "Wendy -0.002654\n", "Xavier 0.000702\n", "Yvonne 0.000308\n", "Zelda -0.001066\n", "Name: x, dtype: float64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%time\n", "df = dd.read_parquet('data/2000-01.parquet', columns=['name', 'x'], engine='pyarrow')\n", "df.groupby('name').x.mean().compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here the difference is not that large, but with larger datasets this can save a great deal of time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learn more\n", "\n", "http://docs.dask.org/en/latest/dataframe-create.html" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 4 }