Machine Learning & IBM i - Common France
←
→
Transcription du contenu de la page
Si votre navigateur ne rend pas la page correctement, lisez s'il vous plaît le contenu de la page ci-dessous
IBM Systems Center Montpellier Machine Learning & IBM i • Les Solutions IBM de Machine Learning • Démarrer sur IBM i avec des solutions Open Source et Scikit-learn Benoit MAROLLEAU - Cloud Architect IBM Cognitive Systems - Client Center Montpellier, France benoit.marolleau@fr.ibm.com Replay & More info https://ibm.biz/bma-wiki https://callforcode.org
Introduction & cas d’usage Optimisation de la logisitique, prediction de récolte, prediction de fraude, customer churn (attrition) , maintenance predictive, etc. Avant de démarrer sur Scikit–learn : Introduction au Machine Learning Solutions IBM pour l’intelligence artificielle Agenda Solutions Logicielles et Materielles, Private/Public Cloud : Watson , Linux & IC922/AC922 Solutions IBM i / AIX au plus près des données Démonstration – Scikit-learn sur IBM i Installation dans un IFS container dans PASE, premiers contacts sur un cas concret Questions / Réponse , Pour aller plus loin… Détails des Solutions, Quelques liens utiles, TP à distance disponibles, contactez moi!!
Le Machine learning est partout … et influence notre vie de tous les jours! Netflix provides personalized movie recommendations Waze provides a personalized driving experience for its users
Le Machine learning est partout … et influence notre vie de tous les jours! ‘Epidemics Progression Index (EPI)’ https://towardsdatascience.com/italian-covid-19-analysis-with-python-1bdb0e64d5ac WW Dataset Johns Hopkins University https://github.com/resCovid19/COVID-19
Machine Learning ? • Machine Learning : moyen d’automatisation en apprenant des données, sans programmer explicitement les règles. • établit des relations de causalité dans les données, alors qu’on observe un phénomène (complexe?) récurrent • L’expertise métier est nécessaire en phase d’apprentissage pour sélectionner les données, les exemples, façonner l’algorithme, le modèle. Elle est nécessaire également pour surveiller et maintenir ces modèles. • Le ML n’est pas nouveau (1950’s), mais la puissance de calcul, les librairies et Framework nécessaires sont maintenant disponibles et accessibles au plus grand nombre. • Deep Learning (DL) : classe particulière d’algorithmes Machine Learning faisant appel à des algorithmes à base de réseau de neurones – très efficaces sur des phénomènes « complexes » • ML fait partie de l’AI, domaine à la croisée de la robotique, la philosophie, la sociologie, les mathématiques…
Machine Learning? Pourquoi s’y intéresser? à Je peux dès maintenant améliorer une application avec du Machine Learning !! Aide à la décision Détection d’un évènement à base d’images, de données structurées, Classification, segmentation des clients : Retail, Assurance, Banque, Surveillance video, Maintenance… User Experience (UX) User Interface par l’image, la voix, le texte, helpdesk, chabot (NLP), Traitement plus rapide des demandes, amélioration continue de services et produits Augmentation (aide) et automatisation Taches répétitives, ex: sur les images, ou des documents Un médecin passerait des centaines d’heures à lire les publications utiles pour sa prise de décision: l’IA peut être utile pour l’aider.
Exemple – Optimisation du Turnover Avoir le bon produit dans le bon point de vente. Optimisation logistique, des stocks. Optimisation de la Supply Chain basée sur les habitudes et données des clients (CRM) Solution: IBM i + Machine Learning with Driverless AI
Exemple – Prediction de récolte Estimation de récolte d’une année à l’autre, d’un trimestre à l’autre etc. Résultat en tonnes (modèle de regression) par parcelle et type. Basé sur des données historisées : Données de capteurs, relevés par les ingénieurs et personnes sur le terrain. Solution: IBM i + Machine Learning avec H2O Driverless AI Projet initié suite à une session de type ‘Design Thinking’ , reflexions métier, apport de la technologie …
Exemple: Maintenance Prévisionnelle Historical Data Live Classification: Data Class 0 : asset will fail within the next 15 cycles Class 1 : asset won‘t fail within the next 15 cycles Regression (prediction): Schedule Maintenance Asset Time to failure = 18 cycles Predictive Maintenance 11
Avant de démarrer sur Scikit-learn: Machine Learning
Machine Learning ? L’objectif de l’entrainement d’un algorithme Machine Learning § n = nombre d’exemples dans le dataset d’entrainement, yi est la valeur réelle de colonne à prédire (label) § Entrainement: on cherche à minimiser l’écart entre le résultat obtenu par l’algorithme, f(x), et la réalité y § Les paramètres de l’algorithme vont être ajustés durant l’apprentissage en fonction de l’erreur. § Le volume et la qualité des données : Training Dataset conséquent, connaissance du phénomène, du métier, afin de sélectionner (ou créer) les bons features (colonnes) § Choix de l’algorithme en fonction du problème est clé, l’initialisation des hyperparamètres également. § A la fin de l’entrainement, le modèle est façonné, les paramètres optimaux sont figés. § L’exécution du modèle entrainé, validé, testé, avec un nouvel exemple jamais vu avant, s’appelle l’inférence § Des accélérateurs - carte graphiques GPU (ou FPGA) – sont utiles en Training, moins souvent en Inférence : Milliers de cores, spécialisés dans la multiplication matricielle (calcul tensoriel, vectoriel) peuvent diviser le temps d’entrainement par 50 § Des outils logiciels, des librairies , existent afin d’assister dans ce travail.
Machine Learning ? L’objectif de l’entrainement d’un algorithme Machine Learning Error Function Regression Minimisation à A chaque iteration, en (presque) chaque cherchant à minimiser la itération fonction de cout, on se raproche d’un modele pertinent § Dans la réalité: plusieurs centaines/milliers de dimensions § Dimensionality Reduction : Dimensions redondantes ou inutiles Classification § Feature Extraction: Création de features supplémentaires qui aident à mieux comprendre le phénomène. Un reseau de neurones fait cette extraction « tout seul » , grande différence avec du machine learning. § Estimation de la qualité du modèle : précision du modèle sur des jeux de tests . est il généralisable sur de nouvelles données ? overfitting vs. underfitting , biais & variance. Méthodes de Régularisation…
Machine Learning ? L’objectif de l’entrainement d’un algorithme Machine Learning Deep Learning = Training Artificial Neural Networks Example solving a non linear classification problem boundary formation with 3 layers and ‘relu’ activation Neural Nets Simulator: http://playground.tensorflow.org 15
16 Machine Learning Durée d’un projet ? De quelques semaines à (en general) quelques mois… Par exemple: Analyse métier et travail sur les données (60%) , modélisation (20%), déploiement et integration (20%)
Solutions Hardware / Software Machine Learning sur IBM i AutoML accéléré (GPU) sur Linux/AC922
? Machine Learning Dev Business Analist ? Data Scientist ? Le choix des outils Data Engineer Operations ? ? Data Science libraries & frameworks • ML: scikit learn, R, Rstudio, SQL, H2O-3 , H2O Driverless AI … • DL: PyTorch, TensorFlow, Caffe , Theano, Chainer … • Distributed : Apache Spark, H2O Sparking Water, HPC tooling… • Accelerated ML: Nvidia CUDA, Rapids , IBM Snap ML à Hundreds to come every month…most of them are free open source Desktop & Server Appliance with Accelerators (GPU) Nvidia DGX , Intel … GPU / FPGA based etc. IBM AC922 : Oak Ridge ‘Summit’ SuperComputer Building Block
Machine Learning Dev Business Analist ? ? Public Cloud ? Dans son Datacenter ? Data Scientist Data Engineer ? Operations ? ? 1. La standardisation des offres AI dans le Public Cloud répond à l’essentiel des besoins • Modèles Pré-entrainés prêt à l’usage • Invocation par API • Des Solutions Cloud permettent de créer des modèles cousus main : Watson Studio, etc. 2. Limitations: • Data Gravity (Transfert des données vs. volume) • Legal, Compliance & Regulations • Pay Per Use vs. Investissement , cout perfomance à AI = Public + Private Cloud (Datacenter) + Edge (embarqué)
Solutions IBM : Data Science/ML/DL WML WML Structured Data All use cases Vision/Acoustic All use cases Tabular data: IoT/Logs/Database Program your models w/ Open source Retail/Inspection Data Science Cockpit Data Engineer Acceleration & AutoML with Driverless AI & Accelerated HW . Scale w/o limit /Security, Quality… AutoML Open Source Libraries H2O Driverless AI WML-CE WML-A Visual Insights WML Studio AIX / IBM i ML on IBM i /AIX Auto Machine learning Free, Easy to install Enterprise Solution for Collaborative ML/DL Ready to use No GPU, runs on CPUs Simplified deployment,, High Performance DL Deep Learning – WML dev & deployment Deep Learning Description Traditional datasets, Relevant for automatic pipelines, LMS (Large Model) Advanced Features : DDL with Video tools AutoAI Model Inference (production) model "explainability", DDL (Distributed) 4+ Nodes… Use WMLCE if installed Train/Infer/Edge Enrichable with Watson Services end to end automation HPC Job Scheduler Can Use WML-A plugin Inference: Support Avaialable from IBM H2O L 1-3 Available from IBM IBM L 1-3 Included IBM L1-3 Included Available from IBM AIX , IBM i, Linux x86, IC/AC922, LC9xx x86/AC922/ IC922 x86/AC922/ IC922 x86/AC922/ IC922 x86/AC922/ IC922 Environments Server Technologies PaaS / SaaS IBM Cloud Power Virtual Server BYOL / AC922 IBM Cloud Yes No Trial only Watson Studio MultiCloud (PAK) On Prems / Cloud Yes Yes CP4D Yes CP4D Yes Yes CP4D Private Cloud
Data Science : IBM i & AIX April 2019 Les Applications AIX et IBM i hébergent les données essentielles de l’entreprise à Créer encore plus de la valeur facilement avec les outils Open Source à disposition pour la Data Science !!
H2O Driverless AI Complements IBM Visual Insights Integration with Cloud Pak for Data - powered by POWER9, OpenPower NVlink & GPU Acceleration IBM Visual Insights H2O Driverless AI is delivers Deep Learning for Images Automatic Machine Learning Transactional Time Series Sensors Log NLP 22
H2O Driverless AI: How it Works Drag and drop data Automatic Visualization Automatic Machine Learning Automatic Scoring Pipelines 1 2 3 4 Use best practice model recipes Deploy ultra-low latency Python or Ingest data from Understand the data shape, and the power of high Java Automatic Scoring Pipelines cloud, big data and outliers, missing values, etc. performance computing to Iterate that include feature desktop systems across thousands of possible transformations and models. models including advanced feature HDFS engineering and parameter tuning SQL Automatic Machine Learning Model Recipes: Survival of the Fittest Model Local Modelling • i.i.d. data Documentation • Time-series Deploy Low- Automatic latency • More on the way X Y Scoring Pipeline Scoring to Amazon S3 Production Advanced Machine learning Snowflake Dataset Feature Model + Algorithm + Interpretability Engineering Tuning Google BigQuery Powered by GPU Acceleration Azure Blog Storage Confidential 23
H2O Driverless AI : Fraud Detection Example • Driverless AI matched 10 years of expert feature “Driverless AI is giving amazing engineering results in terms of feature and model performance “ • Increased accuracy from 0.89 to 0.947 (6%) in Venkatesh Ramanathan detecting fraudulent activity Senior Data Scientist, PayPal • 6X speed up when using H2O4GPU with Driverless AI Leader in Gartner’s 2018 Data Science Quadrant H2O Driverless AI and IBM POWER9 GPU Systems are bringing together the best of breed AI innovation. To handle the increasingly complex workloads of AI you need an integrated system of software and hardware: • supports nearly 2.6x more RAIBM POWER9 M, 9.5x more I/O bandwidth than comparable systems. • Nearly 2X the data ingest speed and over 50% faster feature engineering. • With GPU accelerated machine learning delivering nearly 30X speedup on model building. • Support for up to 6 V100 GPUs on a single system.
AIX/IBM i – H2O Driverless AI Solution ETL customer Data IBM Power Systems via JDBC Connector H2O Fast Start Bundle Enterprise Server (AC922+H2O DAI) Feature Engineering & AI Model Training LPAR AI LINUX Scoring (inference engine) AC922 ML Inference Engine Java MOJO can be deployed in IBM i , AIX and Linux LPAR 25 CONFIDENTIAL
Illustration Rapide Customer Churn ou “Prediction de l’attrition” La prédiction de l’attrition (« churn » en anglais) est l’un des cas d’usages les plus fréquents de la data-science. Elle conduit souvent à l’instauration d’un score prédictif qui sert déclencher des actions marketings et commerciales. La disponibilité de données historiques de clients, d’une part, et les coûts de rétention souvent plus faibles que les coûts d’acquisition, font de la prédiction de l’attrition l’un des premiers travaux d’une équipe de data-scientists. Source : Converteo.com CONFIDENTIAL
Smarter IBM i apps made easy with Driverless AI CRM & Customer Churn scenario Initial CRM No Customer Churn risk estimate per customer
Smarter IBM i apps made easy with Driverless AI Dataset creation from Db2 for i Prepared Training Dataset Extraction or JDBC Data Connector
Smarter IBM i apps made easy with Driverless AI CRM & Customer Churn scenario 2 Import & Visualize Data Model Export Create & Test Models 3 1 Recommendation Engine Extract Scoring Pipeline CRM data Optimized for GPU-Accelerated (or JDBC) Power9 Servers 4 Augment IBM i Applications Monitor Models Model Inference 5 & Iterate Business Database Business Application 29
Export Scoring Pipeline 3 Automatic Scoring Pipelines Export ultra-low latency Python or Java Automatic Scoring Pipelines that include feature transformations and models. Here, the Java/ Python Scoring Pipeline “scorer.zip” to be deployed on the inference system Inference on the Edge , datacenter, Cloud … on any device. Scoring Pipeline export
Smarter IBM i apps made easy with Driverless AI Behind the scenes: CRM application prototype integrated with a real time scoring Pipeline (Java MOJO) Fig. Node-RED Prototyping - CRM dashboard Java MOJO Java MOJO Scoring Pipeline Db2 CRM Invoked from the Node.js app Database
Manual Modeling vs. Driverless AI Accelerated Auto ML § Customer Churn Dataset WA_Fn-UseC_-Telco-Customer-Churn.csv Customer churn is when an existing customer, user, player, subscriber or any kind of return client stops doing business or ends the relationship with a company. § Model Accuracy: 0.79 vs. 0.84 § Time to market : Days vs. Minutes with DAI AutoML w/ Feature Engineering, etc. § Accelerated ML able to play with larger datasets § Accelerated ML saves time & effort & licensing costs (PowerVM cores offload to GPU) à Accelerated Auto ML can create high quality models, that only a team of experienced data scientists with years of business knowledge can build. (Real feedback from customers)
Machine Learning sur IBM i Stratégie Open Source sur IBM i
Stratégie Open Source sur IBM i à Integration & Web • API Management , Web Services, • Web Development Open Source & IBM i : Unix (AIX) Runtime aka PASE • Accès Database à Cloud / Automatisation • Openstack, Cloudinit, Terraform , Ansible à Data Science , Data Analysis
Data Science : IBM i & AIX Librairies et langages disponibles sur AIX & IBM i § Packages RPM “Data Analytics & Data Science” § Complètent les fonctions Db2 § Liste qui s’enrichit régulièrement J Yum install Numpy, Pandas : Data Processing Scipy, Scikit-Learn IPython : interactive Python NLTK : Natural Language Processing Matplotlib, Jupyter : Data Visualization R Language (Interpreter, Runtime) ILE / PASE Applications Machine Learning DB2 for i § Utile dans chaque phase d’un projet ML : Libraries (PASE) Business Data à Préparation & Visualisation, Modeling, Validation… § Particulièrement intéressant: à L’inférence sur IBM i intégrée à une application IBM i (RPG, Db2, Node.js) § L’accélération matérielle (GPU) parfois nécessaire: è ‘large datasets’ , fort parallèlisme d’éxécution è Utilisation de librairies ‘Deep Learning’ (neural nets) è Économie des ressources CPU / Mémoire sur IBM i (price / performance) è Entrainement des modèles sur Power IC922 / AC922 + WML-CE (Cloud public ou privé)
Data Science sur IBM i à Scikit-learn https://scikit-learn.org • Outil simple et efficace pour du data mining / data analysis • Accessible à tous, réutilisable dans divers contextes • Repose sur les librairies NumPy, SciPy, matplotlib • Open source depuis 2010 – INRIA • Utilisable commercialement - BSD license • Plus souvent utilisé avec panda (manipulation de données - dataframe) • Cours en Français ? http://cedric.cnam.fr/vertigo/Cours/ml/tpIntroductionScikitLearn.html
Machine learning sur IBM i • Packages nécessaires – 1.2 GB Package (RPM) Description Tcl TCL language support Tk TK package for GUI support of TCL Python3 Python 3 support Python3-devel Python3 development package Yum Install python3-ibm_db DBAPI support backage for IBM i (*ALLOBJ) Pyton3-numpy Numpy package Python3-scipy Scipy package Python3-scikit-learn Scikit-learn package Libzmq Zero-MQ library Package Python (PIP) Description Package Python juypterlab Includes jupyter (Julia/Python/R IDE) (utilisateur) joblib Model and large objects persistence on Filesystem
Machine learning sur IBM i • Chroot jails (Isolation ds PASE/IFS ) J • n Data Scientists = n bacs à sable (chroot) • Environnements Dev + Prod • 1.2GB par environnement Scikit-learn mop_chroots • Le reste du système ( / ) isolé yum install --installroot /QOpenSys/mop_chroots/dev1 tcl tk python3 dev1 dev2 . . . devN prod python3-devel python3-ibm_db SSH python3-numpy python3-scipy python3-scikit-learn ibm_dbi python3-pyzmq odbc libzmq python3-pip ibm_dbi python3-pandas opensssl odbc bash https://10.7.19.71:8881 Prod ssh dev1@BENOIT.ICC.LOCAL bash-4.4$ pip3 install jupyterlab App bash-4.4$ pip3 install joblib bash-4.4$ jupyter notebook --port 8881 --ip 10.7.19.71
Machine learning sur IBM i 1. Entrainement et validation modèle (IBM i ou pas) 2. Chargement du modèle depuis l’IFS 3. Scoring (Inference) sur IBM i (PASE) 4. Integration de modèle dans l’application PASE Program Call or REST PASE or ILE Prod App Fig. Scoring 20 clients dans PASE (Python3)
IBM Data Science Technologies & IBM i à Demonstration • Customer Churn • Supervised model – classification with Scikit-Learn • Install yum packages and git clone • https://github.com/bmarolleau/firstdemo-scikitlearn-ibmi/
Data Science sur IBM i https://scikit-learn.org
Machine learning sur IBM i
Machine learning sur IBM i
Pour aller plus loin… CONFIDENTIAL
Quelques cours et documentations en ligne… https://cognitiveclass.ai/courses/data-analysis-python https://www.coursera.org/specializations/data-science-python CONFIDENTIAL
Quelques documentations utiles… How to Start With Machine Learning on IBM i (by Gan Zhang, IBM - Nov 2019 ) Accessing the rest of your IBM i from Python ibm_dbi driver presentation Accessing the rest of your IBM i from IBM i / Windows/Linux : ODBC Driver github examples YiPS http://yips.idevcloud.com/wiki/ Bitbucket IBM i Open Source https://bitbucket.org/ibmi/opensource/ My blog J (K8s , App modernization , ML , H2O, Watson, Node-RED, IBM i) https://ibm.biz/bma-wiki Videos 2020 (AIX/IBM i in IBM Cloud , Terraform) My Youtube Channel
Workshops à Distance Présentation / Démo / Travaux Pratiques Power IBM i à distance è En Planifié avec Expert IBM ou Travaux Pratiques Autonome. è Nous fournissons l’accès distant à nos plateformes, l’expertise et les supports de présentation et de travaux pratiques!! Contacts Systems Center Montpellier : Benoit Marolleau benoit.marolleau@fr.ibm.com è Durée: 1 ou 2 heures (Webinar) à 1 semaine (en autonome) Marc Lapierre (Briefing Client) mlapierre@fr.ibm.com Christophe Cavelier (BP) chris.desclavelles@fr.ibm.com è Acces via OpenVPN et Web Conférence è IBM i 7.3/7.4 pré-chargés, AIX 7.x, AC922, Linux, OpenShift… Sujets déja disponibles: POW4- Machine Learning, Jupyter & Scikit-learn sur IBM i POW1 - H2O.ai Driverless AI : Machine Learning Accéléré avec integration IBM i POW5- Open Source et IBM i : Node-red, node.js, web services. POW2- Démarrer sur IBM Cloud et Power Virtual Server IBM i POW7- Open Source & IBM i : Cloud et Automatisation de l'infrastructure avec Terraform + Tous les sujets du moment : Cloud, PowerVC, AI Accéléré (Vision, WML-CE) , Db2 Mirror / Haute Dispo Si besoin: agenda à la carte, contactez nous! Red Hat OpenShift CONFIDENTIAL
Introduction & cas d’usage Optimisation logisitique, prediction de récolte, prediction de fraude, customer churn (attrition) , maintenance predictive, etc. Avant de démarrer sur Scikit–learn : Introduction au Machine Learning Solutions IBM pour l’intelligence artificielle Agenda Solutions Logicielles et Materielles, Private/Public Cloud : Watson , Linux & IC922/AC922 Solutions IBM i / AIX au plus près des données Démonstration – Scikit-learn sur IBM i Installation dans un IFS container dans PASE, premiers contacts sur un cas concret Questions / Réponse , Pour aller plus loin… Détails des Solutions, Quelques liens utiles, TP à distance disponibles, contactez moi!! Merci !
Montpellier Cognitive Systems Lab Manager : Philippe Chonavel Coordinator : Alain Roy Montpellier Client Center (France) - EMEA Team PowerAI/WML ATS Europe Our Offerings Jean-Armand Broyelle 1. Technical Consultancy & Assistance Regis Cely 2. Co-Creation Lab Workshops Sebastien Chabrolles 3. Hands-On Technical Enablement Sylvain Delabarre 4. Demonstration/PoT Maxime Deloche 5. PoC/Benchmark Thierry Huche Benoit Marolleau Team CAPI/SNAP (FPGA) Bruno Mesnet Alexandre Castelane Fabrice Moyen PowerHA Analytics , ML/DL Team HPC Modernization Virtualization Ludovic Enault Cloud / Hybrid Cloud Performance Benchmarks Pascal Vezolle
A vous de jouer ! POW1 - H2O.ai Default Payment / customer churn scenario *** Remote Hands-on Labs option!! Data Scientist in a Box on Power AC922 / IC922 H2O Driverless AI is an artificial intelligence platform for automatic machine learning (ML). POW2- Get your hands dirty with IBM Cloud and Power Virtual Server ***webinar Get started with IBM Cloud & discover the new Power Virtual Server offering with this guided hands-on lab / demonstration session. Get AIX/IBM i/Linux VM on PowerVM is a few seconds , in a Pay as you Go model. POW3 - Business Continuity : Db2 Mirror, Logical Replication, and PowerHA ***webinar/Labs upon request Positioning and technical aspects, External Storage Solutions, Q&A CONFIDENTIAL
A vous de jouer ! POW4- Machine Learning, Jupyter & Scikit-learn on IBM i /AIX ***Remote Hand-on Labs Option Get started with free open source frameworks & libraries on your favorite platform (AIX/IBM i) POW5- Open Source on IBM i : node-red, node.js, web services. ***Remote Hand-on Labs Option Embrace open source technologies on your IBM i platform , and build , prototype, deploy your first Web Service. POW6- Open Source on IBM i : Cloud & automation with Ansible ***soon Use core Ansible modules , IBM i modules & playbooks for : Configuration of servers, Application deployment, POW7- Open Source on IBM i : Cloud & automation with Terraform ***webinar Terraform providers for PowerVC private & CAM (Cloud Pak for MCM) , IBM Terraform provider for Public Cloud + Visual Insights + AC922 / WML-CE …. Contact us for Any customized Agenda of course!! CONFIDENTIAL
Get Started Today : Contact us! PowerAI Developer Portal https://developer.ibm.com/linuxonpower/deep-learning-powerai/technology-previews/powerai-vision/ AI Vision Object Detection Demo https://www.youtube.com/watch?v=19vaot75JCY & Jupyter notebook AI Vision / Public Cloud – Get Started demo https://github.com/IBM/powerai-vision-object-detection PowerAI FAQ https://developer.ibm.com/linuxonpower/deep-learning-powerai/faq/ PowerAI Vision 1.1.1 Free trial Register for a free 3-day trial of PowerAI Vision H2o.ai Driverless AI (Trial 21 days) https://www.h2o.ai/products/h2o-driverless-ai/ Presentations, Demo Replays : https://ibm.biz/bma-wiki Want to know more? Need support ? Montpellier team & AI Environment for your PoC / Tests: Driverless/PowerAI/AI Vision Remote Access Contact us : a2roy@fr.ibm.com / benoit.marolleau@fr.ibm.com 56
Get Started Today – H20 Driverless AI Videos and Solution Brief H2O Driverless AI Tutorials Client Reference Case Study AC922-H2O DAI bundle Want to know more? Need support ? Contact us : - European Cognitive Systems CoC - IBM Montpellier a2roy@fr.ibm.com / benoit.marolleau@fr.ibm.com Power Systems & AI Environment for your PoC / Tests: Driverless/PowerAI/AI Vision Remote Access. - IBM CSSC Workshop? AI Workshop at CSSC WML & AI Demos on the IBM WW Demo portal : https://www.ibm.com/systems/clientcenterdemonstrations Presentations, Demo Replays for the author : https://ibm.biz/bma-wiki WML-CE FAQ https://developer.ibm.com/linuxonpower/deep-learning-powerai/faq/ H2o.ai Driverless AI (Trial 21 days). https://www.h2o.ai/products/h2o-driverless-ai/
Gartner names H2O as Leader with the most completeness of vision • H2O.ai recognized as a technology leader with most completeness of vision • H2O.ai was recognized for the mindshare, partner network and status as a quasi-industry standard for machine learning and AI. • H2O customers gave the highest overall score among all the vendors for sales relationship and account management, customer support (onboarding, troubleshooting, etc.) and overall service and support.
Shortage of Data Scientists “The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts” –McKinsey Prediction for 2018
H2O.ai: Worldwide Community Adoption * DATA FROM GOOGLE ANALYTICS EMBEDDED IN THE END USER PRODUCT
Use cases examples: AI in Financial Services Wholesale / Commercial Banking IT Infrastructure • Know Your Customers (KYC) • Security Cyberlake • Anti-Money Laundering (AML) • DoS Detection and Protection • Master Data Management Card / Payments Business Retail Banking • Transaction Frauds • Deposit Fraud • Collusion Fraud • Customer Churn Prediction • Real-Time Targeting • Auto-Loan • Credit Risk Scoring • In-Context Promotion
H2O Driverless AI? Containerized App IBM Cloud Private or OpenShift CP n Worker nodes GPU GPU GPU … GPU Cuda Drivers + WML-CE IC922/ AC922 On Prems/Cloud 62
Example: Predictive Maintenance 2 with Driverless AI Automatic Visualization Understand the data shape, outliers, missing values, etc. Data Visualization on Driverless AI
Example: Predictive Maintenance 2 with Driverless AI Create new model Automatic Machine Learning Use best practice model recipes and the power of high performance computing to Iterate across thousands of possible models including advanced feature engineering and parameter tuning Target Column : Asset Failure in 15 cycles (YES/NO) for binary classification Or estimate the Time to Live (in number of cycles) with a regression
Example: Predictive Maintenance with Driverless AI Create new model Automatic Machine Learning • POWER9 supports nearly 2.6x more RAM, 9.5x more I/O bandwidth than comparable systems. • Nearly 2X the data ingest speed and over 50% faster feature engineering. • With GPU accelerated machine learning delivering nearly 30X speedup on model building. • Support for up to 6 V100 GPUs on a single system Experiment running –GPU Accelerated AutoML on POWER9
Example: Predictive Maintenance with Driverless AI Turbine sensors dataset. RUL is the column to predict. Sensors data imported. Contains 21 sensor data for each turbine RUL: Remaining Useful Life 100 turbines, 200+ cycles per turbine, before failure. (or Time to Live)
Example: Predictive Maintenance with Driverless AI Reduced Training on Training 19K rows time Target: RUL Precision Metric RMSE=35.7 (Root Mean Square Error) Feature Importance in the model decision: 1. Cycle# 2. Sensor11, 3. Sensor4 etc.
Java Mojo Scoring Pipeline on IBM i/AIX/Linux Real time prediction on IBM i !! (AIX/Linux) with the java MOJO Scoring Pipeline built by Driverless AI Real time Customer Churn predictions on IBM i
DAI Data Connectors: JDBC Connection to DB2 for i How to configure a Db2 Data Connector: 1. Download a Db2 JDBC Driver first - Ex: Db2 for i type 4 jdbc driver jt400.jar : https://sourceforge.net/projects/jt400/ - Db2 LUW : get the appropriate db2jcc4.jar 2. Get the original Driverless AI Config file aka /etc/dai/config.toml Many ways to do that, either from the image or a running container (docker cp etc.) - K8s example : kubectl cp -c dai-gpu-mop dai-demo-mop-6b75f6b5b9-kddd:/etc/dai . 3. Edit this config.toml as shown below (refer to the Online doc for further information) 4. Inject config.toml, and the driver jar file(s) , to be mounted in your DAI POD/Docker container. - Inject it in your DAI image or better, use a K8s ConfigMap / persistent volume to persist it 5. Restart your DAI container to refresh the config. /etc/dai/config.toml # jdbc: JDBC Connector, remember to configure JDBC below. (jdbc_app_configs) enabled_file_systems = "upload, file, hdfs, s3, jdbc " jdbc_app_configs = '{"db2fori": {"url": "jdbc:as400://;", "jarpath": "/etc/dai/jt400.jar", "classpath": "com.ibm.as400.access.AS400JDBCDriver"}}’ # Note: here, the jtopen driver jtopen.jar is mounted in the DAI Container, in /etc/dai
DAI Data Connectors: JDBC Connection to DB2 for i Data Connectors - JDBC
DAI Data Connectors: JDBC Connection to DB2 for i Configure your Data source: Db2 Connection & Query with credentials to your Db2 database.
Example: Predictive Maintenance Architecture Overview Asset Management + IoT Solution for Predictive Maintenance Watson IoT Platform Asset Management IoT MQTT Broker On Premise or IoT Platforms on IBM Cloud Model API Call Integration – Data Transformation WebSphere IIB Work Order Scoring - Decision Dashboad - Reports SPSS Modeler Cognos BI SPSS C&DS Accelerated ML/DL - IBM Watson Studio and/or IBM Watson Machine Learning (WML) and/or H2O Driverless AI - OpenPOWER – AC922 WML-CE & GPU Accelerators 07/05/2020 72
Example: Predictive Maintenance with Watson Studio Model created with AutoAI on Studio. Can be deployed on WML (REST API…)
Computer Vision with PowerAI Vision & IBM i [understands natural- language and responds in human-like conversation] CONVERSATION Camera [ Web Browser ] Object Detection API Node-RED Deployed Model APIs FB Detectron Faster RCNN Yolo V2 DEVICE AC922
Computer Vision with PowerAI Vision & IBM i
Computer Vision with PowerAI Vision & IBM i IBM i can now detection objects using an object detection API on premise powered by PowerAI Vision On ppc64le with Nvidia GPUs for inference.
Computer Vision with PowerAI Vision & IBM i
Accelerated embedded edge devices Train on PowerAI Vision but infer on embedded devices
Made with PowerAI Vision – Drone Inspection
Acceleration in training …. days become hours Performance… Large AI Models Train faster training times ~4 Times Faster Faster Training and Inferencing POWER9 Servers with NVLink to GPUs vs for data scientists x86 Servers with PCIe to GPUs Distributed Deep Learning Traditional Model Support à Large Model Support (Competitors) (PowerAI) Limited memory on GPU forces Use system memory and GPU trade-off in model size / data to support more complex models resolution and higher resolution data DDR4 CPU DDR4 POWER CPU POWER NVLink NVLink PCIe Data Pipe Graphics Memory GPU Graphics GPU Memory
• IBM POWER SYSTEMS AC922 An Acceleration Superhighway Unleash state of the art IO and accelerated computing potential in the post “CPU-only” era Designed for the AI Era Architected for the modern analytics and AI workloads that fuel insights Delivering Enterprise-Class AI Flatten the time to AI value curve by accelerating the journey to build, train, and infer deep neural networks
OpenPower Foundation – After 6 years of existence… § Drivers: Innovation vs. Moore law § Collaborative Approach § Domains: – Cloud & Scale Out Architecture – Research/ High Perf Computing – Analytics, Big Data – Machine Learning / Deep Learning § IBM Sells OpenPower servers § Power Systems POWER9… https://www.ibm.com/cloud/bare-metal-servers/power https://console.bluemix.net/docs/services/PowerAI-IBM/
OpenPower Summit 2018 Source Forbes
(Free) PowerAI 1.6.2 (WML-CE) Overview 5765-PAI - Linux Native install (conda) or docker image on RHEL 7.6+ / Ubuntu 18.04 https://hub.docker.com/r/ibmcom/powerai/ Works with CUDA 10 + Nvidia drivers § Distributed Deep Learning (DDL) 1.3.0 Not Just About Hardware Design § TensorFlow 1.13.1 It’s about co-optimized § IBM Caffe 1.0.0 § Caffe2 1.0.1 software § PyTorch 1.0.1 § Snap ML 1.2.0 + hardware § Rapids cuDF /cuML 0.2.0 Available on x86-64 architecture, Optimized for § IBM AC922 POWER9 system with NVIDIA Tesla V100 GPUs § IBM S822LC POWER8 system with NVIDIA Tesla P100 GPUs Announcement Letter
WML-CE : Enterprise AI Platform Simplicity: Integrated Ease of Use, Unique Faster Model Open AI Platform w/ Platform that Just Works Capabilities Training Time Ecosystem Partners Power9 IBM ISV Solution CPU SW SW SIs PowerAI GPU Curate, Test, and Support Large data & model Faster Training Times in Platform that Partners Fast Moving Open Source support due to NVLink Single Server can build on Provide Enterprise Acceleration of Analytics Scalability to 100s of Software Partners: H2O, Distribution on RedHat & ML Servers (Cluster level IBM, Anaconda Easy to deploy Enterprise AutoML: PowerAI Vision Integration) AI Platform SIs, Solution Vendors Elastic Training: Scale Leads to Faster Insights & Accelerator Partners GPUs as Required and Better Economics
Cognitive Systems are built with optimized HW & SW Not Just About Hardware Design It’s about co-optimized Industry Alignment Dev Ecosystem software Partnerships hardware + Open Frameworks IBM Software which just work for Machine Learning, Deep Learning and AI POWER9 is the only processor with NVLink 2.0 from CPU to GPU Delivering 5.6X Host-Device bandwidth vs Xeon E5-2640 v4 Open Accelerator Interfaces based systems with CUDA H2D Bandwidth Test No code changes are required to leverage NVLink capability Accelerator Roadmaps “We’re excited to see accelerating progress as the Oak Ridge National Laboratory Summit supercomputer machine, which we expect will be among the world’s fastest supercomputers. The advanced capabilities of the IBM POWER9 CPUs coupled with the NVIDIA Volta GPUs will significantly advance DOE’s mission critical applications,” says Buddy Bland, Oak Ridge Leadership Computing Facility Director
Snap ML Distributed GPU-Accelerated Machine Learning Library Snap Machine Learning (ML) Library APIs for Popular ML Frameworks Logistic Regression Linear Regression Support Vector More Coming Soon Machines (SVM) (coming libGLM (C++ / CUDA Distributed Hyper-Parameter soon) Optimized Primitive Lib) Optimization Distributed Training pai4sk API (Included in PAI 1.5.4) snap-ml-local API 4 APIs snap-ml-mpi API snap-ml-spark API 87
• Snap ML: Training Time Goes From Logistic Regression in Snap ML (with An Hour to Minutes GPUs) vs TensorFlow (CPU-only) 46x faster than previous 80 record set by Google 1.1 Hours 46x Faster Workload: Click-through rate prediction for advertising 60 Runtime (Minutes) Logistic Regression Classifier in Snap ML using GPUs vs 40 TensorFlow using CPU-only 20 Dataset: Criteo Terabyte Click Logs 1.53 (http://labs.criteo.com/2013/12/download-terabyte-click-logs/) Minutes 4 billion training examples, 1 million features 0 Model: Logistic Regression: TensorFlow vs Snap ML Test LogLoss: 0.1293 (Google using Tensorflow), 0.1292 (Snap ML) Google Snap ML Platform: 89 CPU-only machines in Google using Tensorflow versus CPU-only Power + GPU 4 AC922 servers (each 2 Power9 CPUs + 4 V100 GPUs) for Snap ML Google data from this Google blog 90 x86 Servers 4 Power9 Servers (CPU-only) With GPUs 88
Deep Learning = Training Artificial Neural Networks Based on biological neurons. Artificial neurons learn by recognizing patterns in data. Neural net Framework & Python Programming Each input value is a pixel Each output contains: RGB Value (3 Channels) outputs = f(f(f…(x)) detection_boxes detection_scores Training = Adjust the parameters (w,b) detection_classes To get the best validation results num_detections Vs. ground truth (Training set). Inference = apply the model to the input (w,b y = f(x) fixed) to get the output (classification, detection, …)
Demonstration Machine Learning techniques • Classification: predict class from observations • - E.g. Spam Email Detection • Clustering: group observations into “meaningful” groups • - E.g. Amazon Recommendations • Regression (prediction): predict value from observations • - E.g. Energy consumption prediction and many different technologies and libraries are available:
Machine Learning Methods (examples) K Nearest Bayesian Decision trees Support Vector Machines Neighbours Classifiers Reinforcement learning Cluster Analysis Neural Networks / Deep learning
Deep Learning = Training Artificial Neural Networks Based on biological neurons. Artificial neurons learn by recognizing patterns in data. 92
Deep Learning = Training Artificial Neural Networks Example solving a non linear classification problem boundary formation with 1 layer and non linear “relu” activation function mlp = MLPClassifier(hidden_layer_sizes=(1), 1 layer and 2 neurons 1 layer and 3 neurons Decision plane with 30 neurons max_iter=500000,activation=’identity’,learning_rate_init=0.01) Rectified Linear Unit (ReLu) Neural Nets Simulator: http://playground.tensorflow.org 93
Vous pouvez aussi lire