Unlocking the Power of Accelerated Data Science
Summary
Accelerated data science is revolutionizing the way companies like Spotify and Walmart generate value from their data. By leveraging NVIDIA-accelerated data science, these companies are able to speed up their end-to-end analytics workflows, reducing costs and boosting performance. This article explores the latest resources and tools available for data scientists to harness the power of accelerated data science.
The Rise of Accelerated Data Science
Accelerated data science is becoming increasingly important as data volumes, velocities, and complexities grow. The field is booming, with a high demand for talent and skill sets to design the best data science solutions. To meet this demand, the NVIDIA Deep Learning Institute (DLI) has released the Accelerated Data Science Teaching Kit, a comprehensive resource for educators and students.
Key Takeaways from GTC 21
The NVIDIA GTC 21 conference highlighted several key sessions on data science, including GPU-accelerated model evaluation, accelerated ETL, training and inference of recommender systems, and deploying GPU-accelerated applications across hybrid and multi-clouds.
GPU-Accelerated Model Evaluation
One of the highlighted sessions at GTC 21 was on GPU-accelerated model evaluation. The session demonstrated how using cuDF and Dask-CUDF can drastically reduce the time it takes to evaluate recommender systems in an offline setting. This allows for faster iteration and better model building.
Accelerated ETL, Training and Inference
Another session focused on accelerated ETL, training and inference of recommender systems using NVIDIA GPUs. This included the use of Merlin, HugeCTR, NVTabular, and Triton to speed up these processes.
Deploying GPU-Accelerated Applications
The Cloudera Data Platform session showed how to deploy GPU-accelerated applications across hybrid and multi-clouds using a single pane of glass. This simplifies the process of managing and scaling GPU-accelerated workloads.
RAPIDS Edition ML Runtime
Cloudera and NVIDIA have expanded their partnership to offer the RAPIDS Edition Machine Learning (ML) Runtime. This runtime is built on top of community-built RAPIDS docker images and provides a secure, customizable, and containerized working environment for data scientists to work with GPUs.
Benefits of RAPIDS
RAPIDS supports Python interfaces, including NVIDIA RAPIDS.ai libraries, which offer near-identical syntax replicas of popular CPU-based Python data science libraries such as Pandas and Scikit-Learn. This allows data scientists to run GPU-based Python libraries like cuDF for dataframes and cuML for ML, reducing the time needed to configure an environment with GPUs and refactor CPU code.
Table: Key Features of RAPIDS Edition ML Runtime
Feature | Description |
---|---|
Secure Environment | Built on top of community-built RAPIDS docker images |
Customizable | Allows for customization of the working environment |
Containerized | Provides a containerized working environment for data scientists |
Python Interfaces | Supports NVIDIA RAPIDS.ai libraries with near-identical syntax replicas of popular CPU-based Python data science libraries |
GPU-Based Libraries | Includes cuDF for dataframes and cuML for ML |
Further Reading
For more information on accelerated data science and the resources available, visit the NVIDIA Developer blog and explore the latest articles and tutorials on data science.
References
- NVIDIA Developer Blog: Data Science - Top New Resources from GTC 21
- NVIDIA Deep Learning Institute: Accelerated Data Science Teaching Kit
- GTC 21: Top 5 Data Science Technical Sessions
- Cloudera and NVIDIA Expand Partnership: RAPIDS Edition ML Runtime
Note
This article is based on the information provided in the specified link and does not include any additional sources or references beyond what is mentioned.
Conclusion
Accelerated data science is a powerful tool for companies looking to speed up their end-to-end analytics workflows and reduce costs. With the latest resources and tools available, data scientists can harness the power of NVIDIA-accelerated data science to drive breakthroughs in various fields. The partnership between Cloudera and NVIDIA, along with the release of the Accelerated Data Science Teaching Kit, is set to broaden the adoption of GPU-accelerated data science in the data science community.