Machine Learning & IBM i - Common France

La page est créée François Marty
 
CONTINUER À LIRE
Machine Learning & IBM i - Common France
IBM Systems Center Montpellier

Machine Learning & IBM i
•   Les Solutions IBM de Machine Learning
•   Démarrer sur IBM i avec des solutions
    Open Source et Scikit-learn

                                                    Benoit MAROLLEAU - Cloud Architect
                                                    IBM Cognitive Systems - Client Center Montpellier, France
                                                    benoit.marolleau@fr.ibm.com

                                                                                                                Replay & More info
                                                                                                                https://ibm.biz/bma-wiki
                          https://callforcode.org
Machine Learning & IBM i - Common France
Introduction & cas d’usage
          Optimisation de la logisitique, prediction de récolte, prediction de fraude, customer churn
         (attrition) , maintenance predictive, etc.

         Avant de démarrer sur Scikit–learn : Introduction au Machine Learning

         Solutions IBM pour l’intelligence artificielle
Agenda   Solutions Logicielles et Materielles, Private/Public Cloud : Watson , Linux & IC922/AC922
         Solutions IBM i / AIX au plus près des données

         Démonstration – Scikit-learn sur IBM i
         Installation dans un IFS container dans PASE, premiers contacts sur un cas concret

         Questions / Réponse , Pour aller plus loin…
         Détails des Solutions, Quelques liens utiles, TP à distance disponibles, contactez moi!!
Machine Learning & IBM i - Common France
Machine Learning

Introduction
 Cas d’usage
Machine Learning & IBM i - Common France
Le Machine learning est partout …
et influence notre vie de tous les jours!

         Netflix provides personalized
         movie recommendations

                                            Waze provides a personalized
                                            driving experience for its users
Machine Learning & IBM i - Common France
Le Machine learning est partout …
et influence notre vie de tous les jours!

                                            ‘Epidemics Progression Index (EPI)’

       https://towardsdatascience.com/italian-covid-19-analysis-with-python-1bdb0e64d5ac

       WW Dataset Johns Hopkins University
       https://github.com/resCovid19/COVID-19
Machine Learning & IBM i - Common France
Le Machine learning est partout …
et influence notre vie de tous les jours!
Machine Learning & IBM i - Common France
Machine Learning ?

• Machine Learning : moyen d’automatisation en apprenant des données, sans programmer explicitement les règles.
• établit des relations de causalité dans les données, alors qu’on observe un phénomène (complexe?) récurrent
• L’expertise métier est nécessaire en phase d’apprentissage pour sélectionner les données, les exemples, façonner
  l’algorithme, le modèle. Elle est nécessaire également pour surveiller et maintenir ces modèles.
• Le ML n’est pas nouveau (1950’s), mais la puissance de calcul, les librairies et Framework nécessaires sont
  maintenant disponibles et accessibles au plus grand nombre.
• Deep Learning (DL) : classe particulière d’algorithmes Machine Learning faisant appel à des algorithmes à base de
  réseau de neurones – très efficaces sur des phénomènes « complexes »
• ML fait partie de l’AI, domaine à la croisée de la robotique, la philosophie, la sociologie, les mathématiques…
Machine Learning & IBM i - Common France
Machine Learning?
Pourquoi s’y intéresser?
à Je peux dès maintenant améliorer une application avec du Machine Learning !!

Aide à la décision
     Détection d’un évènement à base d’images, de données structurées,
     Classification, segmentation des clients : Retail, Assurance, Banque,
     Surveillance video, Maintenance…

User Experience (UX)
     User Interface par l’image, la voix, le texte, helpdesk, chabot (NLP),
     Traitement plus rapide des demandes, amélioration continue de services
     et produits

Augmentation (aide) et automatisation
     Taches répétitives, ex: sur les images, ou des documents
     Un médecin passerait des centaines d’heures à lire les publications utiles
     pour sa prise de décision: l’IA peut être utile pour l’aider.
Machine Learning & IBM i - Common France
Exemple – Optimisation du Turnover
Avoir le bon produit dans le bon point de vente. Optimisation logistique, des stocks.
Optimisation de la Supply Chain basée sur les habitudes et données des clients (CRM)
Solution: IBM i + Machine Learning with Driverless AI
Machine Learning & IBM i - Common France
Exemple – Prediction de récolte
Estimation de récolte d’une année à l’autre,
d’un trimestre à l’autre etc.

Résultat en tonnes (modèle de regression) par
parcelle et type.

Basé sur des données historisées :
Données de capteurs, relevés par les ingénieurs
et personnes sur le terrain.

Solution: IBM i + Machine Learning avec H2O
Driverless AI

Projet initié suite à une session de type ‘Design
Thinking’ , reflexions métier, apport de la
technologie …
Exemple: Maintenance Prévisionnelle

Historical
  Data

  Live
                                                                Classification:
  Data                                                          Class 0 : asset will fail within the next 15 cycles
                                                                Class 1 : asset won‘t fail within the next 15 cycles
                                                                Regression (prediction):
                                         Schedule Maintenance   Asset Time to failure = 18 cycles

                      Predictive Maintenance                                                      11
Avant de démarrer
 sur Scikit-learn:

Machine Learning
Machine Learning ?
L’objectif de l’entrainement d’un algorithme Machine Learning

 §   n = nombre d’exemples dans le dataset d’entrainement, yi est la valeur réelle de colonne à prédire (label)
 §   Entrainement: on cherche à minimiser l’écart entre le résultat obtenu par l’algorithme, f(x), et la réalité y
 §   Les paramètres de l’algorithme vont être ajustés durant l’apprentissage en fonction de l’erreur.
 §   Le volume et la qualité des données : Training Dataset conséquent, connaissance du phénomène, du métier, afin de sélectionner
     (ou créer) les bons features (colonnes)
 §   Choix de l’algorithme en fonction du problème est clé, l’initialisation des hyperparamètres également.
 §   A la fin de l’entrainement, le modèle est façonné, les paramètres optimaux sont figés.
 §   L’exécution du modèle entrainé, validé, testé, avec un nouvel exemple jamais vu avant, s’appelle l’inférence
 §   Des accélérateurs - carte graphiques GPU (ou FPGA) – sont utiles en Training, moins souvent en Inférence : Milliers de cores,
     spécialisés dans la multiplication matricielle (calcul tensoriel, vectoriel) peuvent diviser le temps d’entrainement par 50
 §   Des outils logiciels, des librairies , existent afin d’assister dans ce travail.
Machine Learning ?
L’objectif de l’entrainement d’un algorithme Machine Learning

                   Error Function                                                          Regression
                   Minimisation à                                                          A chaque iteration, en
                   (presque) chaque                                                        cherchant à minimiser la
                   itération                                                               fonction de cout, on se
                                                                                           raproche d’un modele
                                                                                           pertinent

§   Dans la réalité: plusieurs centaines/milliers de dimensions

§   Dimensionality Reduction : Dimensions redondantes ou inutiles
                                                                                           Classification
§   Feature Extraction: Création de features supplémentaires qui aident à mieux
    comprendre le phénomène. Un reseau de neurones fait cette extraction « tout
    seul » , grande différence avec du machine learning.

§   Estimation de la qualité du modèle : précision du modèle sur des jeux de tests .
    est il généralisable sur de nouvelles données ? overfitting vs. underfitting , biais
    & variance. Méthodes de Régularisation…
Machine Learning ?
L’objectif de l’entrainement d’un algorithme Machine Learning

Deep Learning = Training Artificial Neural Networks
Example solving a non linear classification problem

                                   boundary formation
                                   with 3 layers and ‘relu’
                                   activation
                                   Neural Nets Simulator:
                                   http://playground.tensorflow.org

                                                                      15
16

Machine Learning
Durée d’un projet ?

De quelques semaines à (en general) quelques mois…
Par exemple: Analyse métier et travail sur les données (60%) ,
modélisation (20%), déploiement et integration (20%)
Solutions
Hardware / Software
  Machine Learning sur IBM i

   AutoML accéléré (GPU)
      sur Linux/AC922
?
Machine Learning
                                                  Dev
                                                  Business Analist       ?
                                                  Data Scientist
                                                                          ?
Le choix des outils                               Data Engineer
                                                  Operations             ?
                                                                     ?
Data Science libraries & frameworks
• ML: scikit learn, R, Rstudio, SQL, H2O-3 , H2O Driverless AI …
• DL: PyTorch, TensorFlow, Caffe , Theano, Chainer …
• Distributed : Apache Spark, H2O Sparking Water, HPC tooling…
• Accelerated ML: Nvidia CUDA, Rapids , IBM Snap ML
à Hundreds to come every month…most of them are free open source

Desktop & Server Appliance with Accelerators (GPU)
Nvidia DGX , Intel … GPU / FPGA based etc.
IBM AC922 : Oak Ridge ‘Summit’ SuperComputer Building Block
Machine Learning                                        Dev
                                                        Business Analist
                                                                           ?
                                                                               ?
Public Cloud ? Dans son Datacenter ?
                                                        Data Scientist
                                                        Data Engineer
                                                                                ?
                                                        Operations             ?
                                                                           ?
1. La standardisation des offres AI dans le
   Public Cloud répond à l’essentiel des
   besoins
     •   Modèles Pré-entrainés prêt à l’usage
     •   Invocation par API
     •   Des Solutions Cloud permettent de créer des
         modèles cousus main : Watson Studio, etc.

2. Limitations:
     •   Data Gravity (Transfert des données vs. volume)
     •   Legal, Compliance & Regulations
     •   Pay Per Use vs. Investissement , cout perfomance

à AI = Public + Private Cloud (Datacenter)
        + Edge (embarqué)
Solutions IBM : Data Science/ML/DL

                                                                                                        WML                                                             WML
                                     Structured Data                                       All use cases                                 Vision/Acoustic All use cases
                                      Tabular data: IoT/Logs/Database                      Program your models w/ Open source            Retail/Inspection        Data Science Cockpit
    Data Engineer                     Acceleration & AutoML with Driverless AI             & Accelerated HW . Scale w/o limit            /Security, Quality…      AutoML
                           Open Source Libraries
                                                               H2O Driverless AI              WML-CE                     WML-A                Visual Insights        WML Studio
                                AIX / IBM i

                                ML on IBM i /AIX               Auto Machine learning      Free, Easy to install    Enterprise Solution for                        Collaborative ML/DL
                                                                                                                                                Ready to use
                             No GPU, runs on CPUs             Simplified deployment,,    High Performance DL          Deep Learning –                            WML dev & deployment
                                                                                                                                               Deep Learning
Description             Traditional datasets, Relevant for      automatic pipelines,      LMS (Large Model)       Advanced Features : DDL
                                                                                                                                              with Video tools
                                                                                                                                                                        AutoAI
                         Model Inference (production)          model "explainability",     DDL (Distributed)            4+ Nodes…                                Use WMLCE if installed
                                                                                                                                              Train/Infer/Edge
                        Enrichable with Watson Services        end to end automation                                HPC Job Scheduler                            Can Use WML-A plugin
                                                             Inference:

Support                       Avaialable from IBM                    H2O L 1-3           Available from IBM         IBM L 1-3 Included       IBM L1-3 Included    Available from IBM
                               AIX , IBM i, Linux             x86, IC/AC922, LC9xx       x86/AC922/ IC922           x86/AC922/ IC922         x86/AC922/ IC922     x86/AC922/ IC922
Environments
Server Technologies

PaaS / SaaS IBM Cloud        Power Virtual Server            BYOL / AC922 IBM Cloud               Yes                       No                   Trial only         Watson Studio

MultiCloud (PAK)               On Prems / Cloud                         Yes                   Yes CP4D                   Yes CP4D                   Yes                Yes CP4D
Private Cloud
Data Science : IBM i & AIX

                                                    April 2019

                             Les Applications AIX et IBM i hébergent les données
                             essentielles de l’entreprise

                             à Créer encore plus de la valeur facilement avec
                             les outils Open Source à disposition pour la Data
                             Science !!
H2O Driverless AI Complements IBM Visual Insights
Integration with Cloud Pak for Data - powered by POWER9, OpenPower NVlink & GPU Acceleration

                      IBM Visual Insights                                               H2O Driverless AI is
              delivers Deep Learning for Images                                     Automatic Machine Learning

                                                                                                           Transactional

                                                                    Time Series

                                                                                               Sensors

                                                              Log

                                                                                                          NLP

                                                             22
H2O Driverless AI: How it Works
               Drag and drop data                    Automatic Visualization               Automatic Machine Learning               Automatic Scoring Pipelines
           1                               2                                          3                                        4
                                                                                           Use best practice model recipes          Deploy ultra-low latency Python or
               Ingest data from                      Understand the data shape,            and the power of high                    Java Automatic Scoring Pipelines
               cloud, big data and                   outliers, missing values, etc.        performance computing to Iterate         that include feature
               desktop systems                                                             across thousands of possible             transformations and models.
                                                                                           models including advanced feature
                  HDFS                                                                     engineering and parameter tuning

                 SQL                                                            Automatic Machine Learning

                                                                    Model Recipes:            Survival of the Fittest              Model
                  Local
                                     Modelling                      • i.i.d. data                                                  Documentation
                                                                    • Time-series                                                                        Deploy Low-
                                                                                                                               Automatic                 latency
                                                                    • More on the way
                                       X         Y                                                                             Scoring Pipeline          Scoring to
                  Amazon S3
                                                                                                                                                         Production
                                                                     Advanced                                                  Machine learning
                 Snowflake            Dataset                        Feature                                 Model
                                                                                      +   Algorithm     +                      Interpretability
                                                                     Engineering                             Tuning

                  Google BigQuery
                                                                              Powered by GPU Acceleration
                  Azure Blog Storage
Confidential                                                                                                                                                             23
H2O Driverless AI : Fraud Detection Example

•   Driverless AI matched 10
    years of expert feature                                                                                   “Driverless AI is giving amazing
    engineering                                                                                               results in terms of feature and
                                                                                                              model performance “
•   Increased accuracy from
    0.89 to 0.947 (6%) in
                                                                                                                    Venkatesh Ramanathan
    detecting fraudulent activity                                                                                   Senior Data Scientist, PayPal

•   6X speed up when using
    H2O4GPU with Driverless AI                                                                                   Leader in Gartner’s 2018 Data
                                                                                                                      Science Quadrant

               H2O Driverless AI and IBM POWER9 GPU Systems are bringing together the best of breed AI innovation. To handle
               the increasingly complex workloads of AI you need an integrated system of software and hardware:

               •   supports nearly 2.6x more RAIBM POWER9 M, 9.5x more I/O bandwidth than comparable systems.
               •   Nearly 2X the data ingest speed and over 50% faster feature engineering.
               •   With GPU accelerated machine learning delivering nearly 30X speedup on model building.
               •   Support for up to 6 V100 GPUs on a single system.
AIX/IBM i – H2O Driverless AI Solution
                                             ETL customer Data
   IBM Power Systems                         via JDBC Connector                        H2O Fast Start Bundle
     Enterprise Server                                                                  (AC922+H2O DAI)

                                                        Feature
                                                        Engineering & AI
                                                        Model Training

                         LPAR                    AI
                         LINUX                   Scoring
                     (inference engine)                                                AC922

                                          ML Inference Engine

                             Java MOJO can be deployed in IBM i , AIX and Linux LPAR                     25
                                                  CONFIDENTIAL
Illustration Rapide
    Customer Churn ou “Prediction de l’attrition”
La prédiction de l’attrition (« churn » en anglais) est l’un des cas d’usages les plus fréquents
de la data-science. Elle conduit souvent à l’instauration d’un score prédictif qui sert
déclencher des actions marketings et commerciales.
La disponibilité de données historiques de clients, d’une part, et les coûts de rétention
souvent plus faibles que les coûts d’acquisition, font de la prédiction de l’attrition l’un des
premiers travaux d’une équipe de data-scientists.
Source : Converteo.com

                                        CONFIDENTIAL
Smarter IBM i apps made easy with Driverless AI
CRM & Customer Churn scenario

                                                              Initial CRM
                                No Customer Churn risk estimate per customer
Smarter IBM i apps made easy with Driverless AI
Dataset creation from Db2 for i

                                                  Prepared
                                                  Training
                                                   Dataset

                   Extraction
                   or JDBC
                   Data Connector
Smarter IBM i apps made easy with Driverless AI
CRM & Customer Churn scenario

                 2
Import & Visualize Data
                                                                    Model Export
Create & Test Models
                                                                         3
    1
                                                                                   Recommendation Engine
Extract                                                                                Scoring Pipeline
CRM data                  Optimized for GPU-Accelerated
(or JDBC)                        Power9 Servers                                 4
                                                                  Augment IBM i Applications
                                           Monitor Models         Model Inference
                                       5
                                           & Iterate

                Business Database          Business Application                                            29
Export Scoring Pipeline                                    3

                           Automatic Scoring Pipelines
                           Export ultra-low latency Python or
                           Java Automatic Scoring Pipelines
                           that include feature
                           transformations and models.

                           Here, the Java/ Python Scoring
                           Pipeline “scorer.zip” to be deployed
                           on the inference system

                           Inference on the Edge , datacenter,
                           Cloud … on any device.
 Scoring Pipeline export
Smarter IBM i apps made easy with Driverless AI

                                                                    Behind the scenes:
                                                                    CRM application prototype
                                                                    integrated with a real time
                                                                    scoring Pipeline (Java MOJO)

          Fig. Node-RED Prototyping - CRM dashboard
                                                                 Java MOJO

   Java MOJO Scoring Pipeline                         Db2 CRM
   Invoked from the Node.js app                       Database
Manual Modeling vs. Driverless AI Accelerated Auto ML

§    Customer Churn Dataset WA_Fn-UseC_-Telco-Customer-Churn.csv
      Customer churn is when an existing customer, user, player, subscriber or any kind of return client stops doing
      business or ends the relationship with a company.

§    Model Accuracy: 0.79 vs. 0.84

§    Time to market :
      Days vs. Minutes with DAI
      AutoML w/ Feature Engineering, etc.

§    Accelerated ML able to play with larger datasets

§    Accelerated ML saves time & effort
      & licensing costs (PowerVM cores offload to GPU)

       à Accelerated Auto ML can create high quality models, that only a team of
       experienced data scientists with years of business knowledge can build.
       (Real feedback from customers)
Machine Learning
   sur IBM i

                   Stratégie Open Source sur IBM i
Stratégie Open Source sur IBM i

à Integration & Web
  • API Management , Web Services,
  • Web Development                     Open Source & IBM i : Unix (AIX) Runtime aka PASE
  • Accès Database

à Cloud / Automatisation
  • Openstack, Cloudinit, Terraform , Ansible

à Data Science , Data Analysis
Data Science : IBM i & AIX
Librairies et langages disponibles sur AIX & IBM i

 §    Packages RPM “Data Analytics & Data Science”
 §    Complètent les fonctions Db2
 §    Liste qui s’enrichit régulièrement J                                      Yum install
       Numpy, Pandas : Data Processing
       Scipy, Scikit-Learn
       IPython : interactive Python
       NLTK : Natural Language Processing
       Matplotlib, Jupyter : Data Visualization
       R Language (Interpreter, Runtime)                                              ILE / PASE Applications

                                                                                  Machine Learning      DB2 for i
 §    Utile dans chaque phase d’un projet ML :                                     Libraries (PASE)   Business Data
  à Préparation & Visualisation, Modeling, Validation…

 §    Particulièrement intéressant:
 à L’inférence sur IBM i intégrée à une application IBM i (RPG, Db2, Node.js)

 §    L’accélération matérielle (GPU) parfois nécessaire:
       è   ‘large datasets’ , fort parallèlisme d’éxécution
       è   Utilisation de librairies ‘Deep Learning’ (neural nets)
       è   Économie des ressources CPU / Mémoire sur IBM i (price / performance)
       è   Entrainement des modèles sur Power IC922 / AC922 + WML-CE (Cloud public ou privé)
Data Science sur IBM i
à Scikit-learn https://scikit-learn.org

    • Outil simple et efficace pour du data mining / data analysis
    • Accessible à tous, réutilisable dans divers contextes
    • Repose sur les librairies NumPy, SciPy, matplotlib
    • Open source depuis 2010 – INRIA
    • Utilisable commercialement - BSD license
    • Plus souvent utilisé avec panda (manipulation de données - dataframe)

    •   Cours en Français ? http://cedric.cnam.fr/vertigo/Cours/ml/tpIntroductionScikitLearn.html
Machine learning sur IBM i
• Packages nécessaires – 1.2 GB
                 Package (RPM)          Description

                 Tcl                    TCL language support

                 Tk                     TK package for GUI support of TCL

                 Python3                Python 3 support

                 Python3-devel          Python3 development package

Yum Install      python3-ibm_db         DBAPI support backage for IBM i
(*ALLOBJ)        Pyton3-numpy           Numpy package

                 Python3-scipy          Scipy package

                 Python3-scikit-learn   Scikit-learn package

                 Libzmq                 Zero-MQ library

                 Package Python (PIP)    Description
Package Python   juypterlab              Includes jupyter (Julia/Python/R IDE)
(utilisateur)
                 joblib                  Model and large objects persistence on Filesystem
Machine learning
sur IBM i
 • Chroot jails (Isolation ds PASE/IFS ) J
 • n Data Scientists = n bacs à sable (chroot)
 • Environnements Dev + Prod
 • 1.2GB par environnement Scikit-learn
                                                                              mop_chroots
 • Le reste du système ( / ) isolé
  yum install
  --installroot /QOpenSys/mop_chroots/dev1
  tcl
  tk
  python3                                                       dev1           dev2             . . .      devN   prod
  python3-devel
  python3-ibm_db
                                                   SSH
  python3-numpy
  python3-scipy
  python3-scikit-learn
                                                                                                        ibm_dbi
  python3-pyzmq                                                                                         odbc
  libzmq
  python3-pip                                                                         ibm_dbi
  python3-pandas
  opensssl                                                                            odbc
  bash
                                                           https://10.7.19.71:8881                                Prod
  ssh dev1@BENOIT.ICC.LOCAL
  bash-4.4$ pip3 install jupyterlab                                                                               App
  bash-4.4$ pip3 install joblib

  bash-4.4$ jupyter notebook --port 8881 --ip 10.7.19.71
Machine learning sur IBM i
1.   Entrainement et validation modèle (IBM i ou pas)
2.   Chargement du modèle depuis l’IFS
3.   Scoring (Inference) sur IBM i (PASE)
4.   Integration de modèle dans l’application

                         PASE

        Program
        Call or
        REST             PASE or ILE
                  Prod
                  App
                                       Fig. Scoring 20 clients
                                       dans PASE (Python3)
IBM Data Science Technologies & IBM i
à Demonstration

 • Customer Churn
    • Supervised model – classification with Scikit-Learn

 • Install yum packages and git clone
    • https://github.com/bmarolleau/firstdemo-scikitlearn-ibmi/
Data Science sur IBM i
      https://scikit-learn.org
Machine learning sur IBM i
Machine learning sur IBM i
Pour aller plus loin…

              CONFIDENTIAL
Quelques cours et documentations en ligne…

 https://cognitiveclass.ai/courses/data-analysis-python
 https://www.coursera.org/specializations/data-science-python

                                     CONFIDENTIAL
Quelques documentations utiles…
 How to Start With Machine Learning on IBM i (by Gan Zhang, IBM - Nov 2019 )

 Accessing the rest of your IBM i from Python
 ibm_dbi driver presentation

 Accessing the rest of your IBM i from IBM i / Windows/Linux : ODBC Driver github examples

 YiPS http://yips.idevcloud.com/wiki/

 Bitbucket IBM i Open Source
 https://bitbucket.org/ibmi/opensource/

 My blog J (K8s , App modernization , ML , H2O, Watson, Node-RED, IBM i)
 https://ibm.biz/bma-wiki

 Videos 2020 (AIX/IBM i in IBM Cloud , Terraform) My Youtube Channel
Workshops à Distance
  Présentation / Démo / Travaux Pratiques Power IBM i à distance

è En Planifié avec Expert IBM ou Travaux Pratiques Autonome.
è Nous fournissons l’accès distant à nos plateformes, l’expertise et
les supports de présentation et de travaux pratiques!!                    Contacts Systems Center Montpellier :
                                                                          Benoit Marolleau benoit.marolleau@fr.ibm.com
è Durée: 1 ou 2 heures (Webinar) à 1 semaine (en autonome)                Marc Lapierre (Briefing Client) mlapierre@fr.ibm.com
                                                                          Christophe Cavelier (BP) chris.desclavelles@fr.ibm.com
è Acces via OpenVPN et Web Conférence
è IBM i 7.3/7.4 pré-chargés, AIX 7.x, AC922, Linux, OpenShift…

 Sujets déja disponibles:
 POW4- Machine Learning, Jupyter & Scikit-learn sur IBM i
 POW1 - H2O.ai Driverless AI : Machine Learning Accéléré avec integration IBM i
 POW5- Open Source et IBM i : Node-red, node.js, web services.
 POW2- Démarrer sur IBM Cloud et Power Virtual Server IBM i
 POW7- Open Source & IBM i : Cloud et Automatisation de l'infrastructure avec Terraform

 + Tous les sujets du moment : Cloud, PowerVC, AI Accéléré (Vision, WML-CE) , Db2 Mirror / Haute Dispo
 Si besoin: agenda à la carte, contactez nous!
                                                                                                               Red Hat
                                                                                                              OpenShift

                                                    CONFIDENTIAL
Introduction & cas d’usage
          Optimisation logisitique, prediction de récolte, prediction de fraude, customer churn
         (attrition) , maintenance predictive, etc.

         Avant de démarrer sur Scikit–learn : Introduction au Machine Learning

         Solutions IBM pour l’intelligence artificielle
Agenda   Solutions Logicielles et Materielles, Private/Public Cloud : Watson , Linux & IC922/AC922
         Solutions IBM i / AIX au plus près des données

         Démonstration – Scikit-learn sur IBM i
         Installation dans un IFS container dans PASE, premiers contacts sur un cas concret

         Questions / Réponse , Pour aller plus loin…
         Détails des Solutions, Quelques liens utiles, TP à distance disponibles, contactez moi!!

                           Merci !
Montpellier Cognitive Systems Lab
       Manager : Philippe Chonavel
         Coordinator : Alain Roy          Montpellier Client Center (France) - EMEA

     Team PowerAI/WML ATS Europe                       Our Offerings
         Jean-Armand Broyelle              1. Technical Consultancy & Assistance
               Regis Cely                      2. Co-Creation Lab Workshops
          Sebastien Chabrolles              3. Hands-On Technical Enablement
            Sylvain Delabarre                     4. Demonstration/PoT
            Maxime Deloche                          5. PoC/Benchmark
             Thierry Huche
            Benoit Marolleau

        Team CAPI/SNAP (FPGA)
             Bruno Mesnet
          Alexandre Castelane
             Fabrice Moyen
                                      PowerHA Analytics , ML/DL
               Team HPC                      Modernization   Virtualization
             Ludovic Enault          Cloud / Hybrid Cloud         Performance Benchmarks
             Pascal Vezolle
A vous de jouer !

POW1 - H2O.ai Default Payment / customer churn scenario *** Remote Hands-on Labs option!!
Data Scientist in a Box on Power AC922 / IC922
H2O Driverless AI is an artificial intelligence platform for automatic machine learning (ML).

POW2- Get your hands dirty with IBM Cloud and Power Virtual Server ***webinar
Get started with IBM Cloud & discover the new Power Virtual Server offering with this guided hands-on
lab / demonstration session. Get AIX/IBM i/Linux VM on PowerVM is a few seconds , in a Pay as you Go
model.

POW3 - Business Continuity : Db2 Mirror, Logical Replication, and PowerHA
***webinar/Labs upon request
Positioning and technical aspects, External Storage Solutions, Q&A

                                                 CONFIDENTIAL
A vous de jouer !

POW4- Machine Learning, Jupyter & Scikit-learn on IBM i /AIX ***Remote Hand-on Labs Option
Get started with free open source frameworks & libraries on your favorite platform (AIX/IBM i)

POW5- Open Source on IBM i : node-red, node.js, web services. ***Remote Hand-on Labs Option
Embrace open source technologies on your IBM i platform , and build , prototype, deploy your first Web
Service.

POW6- Open Source on IBM i : Cloud & automation with Ansible ***soon
Use core Ansible modules , IBM i modules & playbooks for : Configuration of servers, Application deployment,

POW7- Open Source on IBM i : Cloud & automation with Terraform ***webinar
Terraform providers for PowerVC private & CAM (Cloud Pak for MCM) , IBM Terraform provider for Public Cloud

+ Visual Insights + AC922 / WML-CE ….

Contact us for Any customized Agenda of course!!
                                                        CONFIDENTIAL
Get Started Today : Contact us!
 PowerAI Developer Portal
 https://developer.ibm.com/linuxonpower/deep-learning-powerai/technology-previews/powerai-vision/

 AI Vision Object Detection Demo
 https://www.youtube.com/watch?v=19vaot75JCY & Jupyter notebook

 AI Vision / Public Cloud – Get Started demo
 https://github.com/IBM/powerai-vision-object-detection

 PowerAI FAQ
 https://developer.ibm.com/linuxonpower/deep-learning-powerai/faq/

 PowerAI Vision 1.1.1 Free trial
 Register for a free 3-day trial of PowerAI Vision

 H2o.ai Driverless AI (Trial 21 days)
 https://www.h2o.ai/products/h2o-driverless-ai/

 Presentations, Demo Replays : https://ibm.biz/bma-wiki

 Want to know more? Need support ?
 Montpellier team & AI Environment for your PoC / Tests: Driverless/PowerAI/AI Vision Remote Access
 Contact us : a2roy@fr.ibm.com / benoit.marolleau@fr.ibm.com
                                                                                                      56
Get Started Today – H20 Driverless AI

    Videos and Solution Brief     H2O Driverless AI Tutorials    Client Reference Case Study   AC922-H2O DAI bundle

   Want to know more? Need support ? Contact us :
   - European Cognitive Systems CoC - IBM Montpellier a2roy@fr.ibm.com / benoit.marolleau@fr.ibm.com
     Power Systems & AI Environment for your PoC / Tests: Driverless/PowerAI/AI Vision Remote Access.
   - IBM CSSC Workshop? AI Workshop at CSSC

   WML & AI Demos on the IBM WW Demo portal : https://www.ibm.com/systems/clientcenterdemonstrations

   Presentations, Demo Replays for the author : https://ibm.biz/bma-wiki

   WML-CE FAQ https://developer.ibm.com/linuxonpower/deep-learning-powerai/faq/

   H2o.ai Driverless AI (Trial 21 days). https://www.h2o.ai/products/h2o-driverless-ai/
Gartner names H2O as Leader with the most
completeness of vision
• H2O.ai recognized as a technology leader with most
  completeness of vision

• H2O.ai was recognized for the mindshare, partner
  network and status as a quasi-industry standard for
  machine learning and AI.

• H2O customers gave the highest overall score among
  all the vendors for sales relationship and account
  management, customer support (onboarding,
  troubleshooting, etc.) and overall service and
  support.
Shortage of Data Scientists

          “The United States alone faces a shortage of 140,000 to
          190,000 people with analytical expertise and 1.5 million
                         managers and analysts”

                          –McKinsey Prediction for 2018
H2O.ai:
Worldwide
Community
Adoption

* DATA FROM GOOGLE ANALYTICS EMBEDDED IN THE END USER PRODUCT
Use cases examples: AI in Financial Services
Wholesale / Commercial Banking        IT Infrastructure
• Know Your Customers (KYC)            • Security Cyberlake
• Anti-Money Laundering (AML)          • DoS Detection and Protection
                                       • Master Data Management

Card / Payments Business              Retail Banking
• Transaction Frauds                  • Deposit Fraud
• Collusion Fraud                     • Customer Churn Prediction
• Real-Time Targeting                 • Auto-Loan
• Credit Risk Scoring
• In-Context Promotion
H2O Driverless AI? Containerized App

           IBM Cloud Private
           or OpenShift CP

           n Worker nodes
                               GPU   GPU   GPU   …   GPU
                                                           Cuda Drivers + WML-CE

           IC922/ AC922
           On Prems/Cloud
                                                            62
Example: Predictive Maintenance                                        2
with Driverless AI
                                      Automatic Visualization
                                      Understand the data shape,
                                      outliers, missing values, etc.

Data Visualization on Driverless AI
Example: Predictive Maintenance                                                               2
with Driverless AI
                                      Create new model
                                                              Automatic Machine Learning
                                                              Use best practice model
                                                              recipes and the power of high
                                                              performance computing to
                                                              Iterate across thousands of
                                                              possible models including
                                                              advanced feature engineering
                                                              and parameter tuning

     Target Column : Asset Failure in 15 cycles (YES/NO) for binary classification
     Or estimate the Time to Live (in number of cycles) with a regression
Example: Predictive Maintenance
with Driverless AI
                               Create new model

                                                       Automatic Machine Learning
                                                       • POWER9 supports nearly 2.6x more RAM, 9.5x
                                                         more I/O bandwidth than comparable systems.
                                                       • Nearly 2X the data ingest speed and over 50%
                                                         faster feature engineering.
                                                       • With GPU accelerated machine learning delivering
                                                         nearly 30X speedup on model building.
                                                       • Support for up to 6 V100 GPUs on a single system

Experiment running –GPU Accelerated AutoML on POWER9
Example: Predictive Maintenance
with Driverless AI
 Turbine sensors dataset. RUL is the column to predict.

 Sensors data imported. Contains 21 sensor data for each turbine   RUL: Remaining Useful Life
 100 turbines, 200+ cycles per turbine, before failure.            (or Time to Live)
Example: Predictive Maintenance
   with Driverless AI

                                                                                                           Reduced
Training on                                                                                                Training
19K rows                                                                                                   time

Target: RUL

Precision
Metric
RMSE=35.7
(Root Mean
Square
Error)
                        Feature Importance in the model decision: 1. Cycle# 2. Sensor11, 3. Sensor4 etc.
Java Mojo Scoring Pipeline on IBM i/AIX/Linux
 Real time prediction on IBM i !! (AIX/Linux)
 with the java MOJO Scoring Pipeline built by Driverless AI

                                  Real time Customer Churn predictions on IBM i
DAI Data Connectors: JDBC Connection to DB2 for i
    How to configure a Db2 Data Connector:

    1.   Download a Db2 JDBC Driver first
          - Ex: Db2 for i type 4 jdbc driver jt400.jar : https://sourceforge.net/projects/jt400/
          - Db2 LUW : get the appropriate db2jcc4.jar
    2.   Get the original Driverless AI Config file aka /etc/dai/config.toml
          Many ways to do that, either from the image or a running container (docker cp etc.)
          - K8s example : kubectl cp -c dai-gpu-mop dai-demo-mop-6b75f6b5b9-kddd:/etc/dai .

    3. Edit this config.toml as shown below (refer to the Online doc for further information)
    4. Inject config.toml, and the driver jar file(s) , to be mounted in your DAI POD/Docker container.
          - Inject it in your DAI image or better, use a K8s ConfigMap / persistent volume to persist it
    5. Restart your DAI container to refresh the config.

/etc/dai/config.toml
  # jdbc: JDBC Connector, remember to configure JDBC below. (jdbc_app_configs)
  enabled_file_systems = "upload, file, hdfs, s3, jdbc "

  jdbc_app_configs = '{"db2fori": {"url": "jdbc:as400://;", "jarpath":
  "/etc/dai/jt400.jar", "classpath": "com.ibm.as400.access.AS400JDBCDriver"}}’

  # Note: here, the jtopen driver jtopen.jar is mounted in the DAI Container, in /etc/dai
DAI Data Connectors: JDBC Connection to DB2 for i

                                    Data Connectors - JDBC
DAI Data Connectors: JDBC Connection to DB2 for i

                       Configure your Data source:
                       Db2 Connection & Query with credentials
                       to your Db2 database.
Example:        Predictive Maintenance
Architecture Overview
Asset Management + IoT Solution for Predictive Maintenance

                                Watson IoT
                                Platform                                                         Asset Management

                                IoT MQTT Broker
                              On Premise or IoT Platforms on
                                       IBM Cloud

                                                               Model API Call

                                                                           Integration – Data Transformation
                                                                                     WebSphere IIB
                                                                                                                      Work Order
                                                                                                 Scoring - Decision
                                                                      Dashboad - Reports           SPSS Modeler
                                                                          Cognos BI                 SPSS C&DS

                                                                            Accelerated ML/DL
                                                                    - IBM Watson Studio
                                                                    and/or IBM Watson Machine Learning (WML)
                                                                    and/or H2O Driverless AI
                                                                    - OpenPOWER – AC922 WML-CE & GPU Accelerators

    07/05/2020
                                                               72
Example: Predictive Maintenance
with Watson Studio

      Model created with AutoAI on Studio. Can be deployed on WML (REST API…)
Computer Vision with PowerAI Vision & IBM i
                             [understands natural-
                          language and responds in
                           human-like conversation]
                                            CONVERSATION
            Camera
        [ Web Browser ]

                                                           Object Detection   API
                                              Node-RED                         Deployed
                                                                              Model APIs   FB Detectron
                                                                                           Faster RCNN
                                                                                             Yolo V2

     DEVICE

                                                                                           AC922
Computer Vision with PowerAI Vision & IBM i
Computer Vision with PowerAI Vision & IBM i
IBM i can now detection objects
using an object detection API on
premise powered by PowerAI Vision
On ppc64le with Nvidia GPUs for
inference.
Computer Vision with PowerAI Vision & IBM i
Accelerated embedded edge devices

Train on PowerAI Vision but infer on embedded
devices
Made with PowerAI Vision – Drone Inspection
Acceleration in training …. days become hours
              Performance…             Large AI Models Train
                                                                          faster training times
                                          ~4 Times Faster
              Faster Training
              and Inferencing
                                 POWER9 Servers with NVLink to GPUs
                                                 vs                          for data scientists
                                   x86 Servers with PCIe to GPUs

Distributed Deep Learning          Traditional Model Support   à         Large Model Support

                                  (Competitors)                       (PowerAI)
                                  Limited memory on GPU forces        Use system memory and GPU
                                  trade-off in model size / data      to support more complex models
                                  resolution                          and higher resolution data

                                         DDR4        CPU                     DDR4        POWER
                                                                                          CPU

                                                                                                   POWER NVLink

                                                                                          NVLink
                                                      PCIe
                                                                                                     Data Pipe

                                         Graphics
                                         Memory      GPU                      Graphics
                                                                                          GPU
                                                                              Memory
• IBM POWER SYSTEMS

AC922
        An Acceleration Superhighway
        Unleash state of the art IO and accelerated
        computing potential in
        the post “CPU-only” era

        Designed for the AI Era
        Architected for the modern analytics
        and AI workloads that fuel insights

        Delivering Enterprise-Class AI
        Flatten the time to AI value curve
        by accelerating the journey to build,
        train, and infer deep neural networks
OpenPower Foundation – After 6 years of existence…

§ Drivers: Innovation vs. Moore law

§ Collaborative Approach

§ Domains:
     –   Cloud & Scale Out Architecture
     –   Research/ High Perf Computing
     –   Analytics, Big Data
     –   Machine Learning / Deep Learning

§ IBM Sells OpenPower servers
     §   Power Systems POWER9…

                                            https://www.ibm.com/cloud/bare-metal-servers/power
                                            https://console.bluemix.net/docs/services/PowerAI-IBM/
OpenPower Summit 2018

                        Source Forbes
(Free) PowerAI 1.6.2 (WML-CE) Overview

5765-PAI - Linux Native install (conda) or docker image on RHEL 7.6+ / Ubuntu 18.04
https://hub.docker.com/r/ibmcom/powerai/
Works with CUDA 10 + Nvidia drivers
§ Distributed Deep Learning (DDL) 1.3.0                 Not Just About Hardware Design
§ TensorFlow 1.13.1
                                                                 It’s about co-optimized
§ IBM Caffe 1.0.0
§ Caffe2 1.0.1                                                                  software
§ PyTorch 1.0.1
§ Snap ML 1.2.0
                                                                      +
                                                                 hardware
§ Rapids cuDF /cuML 0.2.0

Available on x86-64 architecture, Optimized for
§ IBM AC922 POWER9 system with NVIDIA Tesla V100 GPUs
§ IBM S822LC POWER8 system with NVIDIA Tesla P100 GPUs
                                                                      Announcement Letter
WML-CE : Enterprise AI Platform
 Simplicity: Integrated        Ease of Use, Unique           Faster Model            Open AI Platform w/
Platform that Just Works           Capabilities              Training Time           Ecosystem Partners

                                      Power9                                         IBM     ISV     Solution
                                       CPU
                                                                                      SW     SW         SIs

                                                                                           PowerAI
                                       GPU

Curate, Test, and Support      Large data & model        Faster Training Times in   Platform that Partners
Fast Moving Open Source       support due to NVLink           Single Server              can build on

   Provide Enterprise        Acceleration of Analytics    Scalability to 100s of    Software Partners: H2O,
 Distribution on RedHat                & ML               Servers (Cluster level        IBM, Anaconda
Easy to deploy Enterprise    AutoML: PowerAI Vision           Integration)
       AI Platform                                                                   SIs, Solution Vendors
                              Elastic Training: Scale    Leads to Faster Insights   & Accelerator Partners
                                GPUs as Required          and Better Economics
Cognitive Systems are built with optimized HW & SW
                                                  Not Just About Hardware Design
                                                              It’s about co-optimized
                     Industry Alignment
 Dev Ecosystem

                                                                             software
                        Partnerships
                                                              hardware
                                                                         +
                     Open Frameworks

                        IBM Software               which just work for Machine Learning,
                                                           Deep Learning and AI

                                          POWER9 is the only processor with NVLink 2.0 from CPU to GPU
                                                  Delivering 5.6X Host-Device bandwidth vs Xeon E5-2640 v4
        Open Accelerator Interfaces               based systems with CUDA H2D Bandwidth Test
                                                  No code changes are required to leverage NVLink capability
                 Accelerator Roadmaps     “We’re excited to see accelerating progress as the Oak Ridge National
                                          Laboratory Summit supercomputer machine, which we expect will be
                                          among the world’s fastest supercomputers. The advanced capabilities
                                          of the IBM POWER9 CPUs coupled with the NVIDIA Volta GPUs will
                                          significantly advance DOE’s mission critical applications,” says Buddy
                                          Bland, Oak Ridge Leadership Computing Facility Director
Snap ML
     Distributed GPU-Accelerated Machine Learning Library

     Snap Machine Learning (ML) Library
                                                                APIs for Popular ML
                                                                   Frameworks
  Logistic Regression                Linear Regression

   Support Vector
                                     More Coming Soon
   Machines (SVM)

                                                                                 (coming
 libGLM (C++ / CUDA            Distributed Hyper-Parameter                        soon)
Optimized Primitive Lib)               Optimization

                   Distributed Training
                                                                        pai4sk API (Included in PAI 1.5.4)
                                                                        snap-ml-local API
                                                             4 APIs     snap-ml-mpi API
                                                                        snap-ml-spark API

                                                                                                             87
• Snap ML: Training Time Goes From                                                         Logistic Regression in Snap ML (with
         An Hour to Minutes                                                                   GPUs) vs TensorFlow (CPU-only)

                  46x faster than previous                                             80
                   record set by Google                                                           1.1 Hours       46x Faster
               Workload: Click-through rate
                prediction for advertising                                             60

                                                                   Runtime (Minutes)
              Logistic Regression Classifier in
                  Snap ML using GPUs vs                                                40
                TensorFlow using CPU-only

                                                                                       20

Dataset: Criteo Terabyte Click Logs                                                                                   1.53
(http://labs.criteo.com/2013/12/download-terabyte-click-logs/)                                                       Minutes
4 billion training examples, 1 million features                                         0
Model: Logistic Regression: TensorFlow vs Snap ML
Test LogLoss: 0.1293 (Google using Tensorflow), 0.1292 (Snap ML)
                                                                                                   Google            Snap ML
Platform: 89 CPU-only machines in Google using Tensorflow versus                                  CPU-only         Power + GPU
4 AC922 servers (each 2 Power9 CPUs + 4 V100 GPUs) for Snap ML
Google data from this Google blog                                                               90 x86 Servers    4 Power9 Servers
                                                                                                  (CPU-only)         With GPUs
                                                                                                                                     88
Deep Learning = Training Artificial Neural Networks
Based on biological neurons. Artificial neurons learn by recognizing patterns in data.

                                                          Neural net Framework
                                                          & Python Programming

    Each input value is a pixel                                                                Each output contains:
    RGB Value (3 Channels)
                                                                        outputs = f(f(f…(x))
                                                                                               detection_boxes
                                                                                               detection_scores
Training = Adjust the parameters (w,b)                                                         detection_classes
To get the best validation results                                                             num_detections
Vs. ground truth (Training set).
Inference = apply the model to the input (w,b                          y = f(x)
fixed) to get the output (classification, detection, …)
Demonstration
 Machine Learning techniques

• Classification: predict class from observations
     • - E.g. Spam Email Detection
• Clustering: group observations into “meaningful” groups
     • - E.g. Amazon Recommendations
• Regression (prediction): predict value from observations
     • - E.g. Energy consumption prediction

 and many different technologies and libraries are available:
Machine Learning Methods (examples)
K Nearest            Bayesian                     Decision trees      Support Vector Machines
Neighbours           Classifiers

Reinforcement learning         Cluster Analysis                    Neural Networks /
                                                                    Deep learning
Deep Learning = Training Artificial Neural Networks
Based on biological neurons. Artificial neurons learn by recognizing patterns in data.

                                                                                         92
Deep Learning = Training Artificial Neural Networks
   Example solving a non linear classification problem

                        boundary formation with 1 layer and non linear “relu” activation function

     mlp = MLPClassifier(hidden_layer_sizes=(1),                      1 layer and 2 neurons   1 layer and 3 neurons     Decision plane with 30 neurons
     max_iter=500000,activation=’identity’,learning_rate_init=0.01)

                                                                                                                      Rectified Linear Unit (ReLu)

Neural Nets Simulator:
http://playground.tensorflow.org

                                                                                                                          93
Vous pouvez aussi lire