Data scientists vs. Statisticians: Lessons learned after two years of practical experience
←
→
Transcription du contenu de la page
Si votre navigateur ne rend pas la page correctement, lisez s'il vous plaît le contenu de la page ci-dessous
Data scientists vs. Statisticians: Lessons learned after two years of practical experience Prof. Dr. Bertrand Loison, Swiss Federal Statistical Office, Vice-Director 5th International Conference on Big Data for Official Statistics Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda, 3 Mai 2019 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 1
Two years of practical experiences (2017 – 2019) FEUILLE DE ROUTE POUR L’IMPLEMENTATION AU SEIN DE L’OFS DE LA STRATEGIE D’INNOVATION SUR LES DONNEES 01.01.2017 - 01.04.2017 - 01.07.2017 - 01.10.2017 - 01.01.2018 - 01.04.2018 - 01.07.2018 - 01.10.2018 - 01.01.2019 - 01.04.2019 - 01.07.2019 - 01.10.2019 - 31.03.2017 30.06.2017 30.09.2017 31.12.2017 31.03.2018 30.06.2018 30.09.2018 31.12.2018 31.03.2019 30.06.2019 30.09.2019 31.12.2019 Version 1.0 / 31.10.2017 Délivrables T1 T2 T3 T4 T1 T2 T3 T4 T1 T2 T3 T4 01.01.2017 01.04.2017 01.07.2017 01.10.2017 01.01.2018 01.04.2018 01.07.2018 01.10.2018 01.01.2019 01.04.2019 01.07.2019 01.10.2019 31.12.2019 Meilleures pratiques Rapports de voyages GWG Big Un Statistical Commission – UN Global Working Group on Big Data for Official Statistics (https:// unstats.un.org/bigdata/) internationales Data Cohérence avec le MJP 2020 - 2023 Définition de la Définition de la stratégie 2017-2019 (S1.1) Définition de la stratégie 2020-2023 (S1.3) Stratégie Data Innovation 2020 – + = stratégie (S1) Définition de la feuille de route 2017-2019 (S1.2) Définition de la feuille de route 2020-2023 (S1.4) 2023 (S1.1 – S1.4) + Workshop avec les cadres de chaque division métier Aquisition d’expérience pratique. Input pour l’élaboration du MJP 2020 - 2023 Stratégie (S) Phase pilote de mise Identification des projets Reporting devant la GL et rapport Projets pilotes (1 par division BB, WI, GS, RU, REG) (S2.2) en oeuvre (S2) pilotes (S2.1) de fin projet (S2.1 – S2.2) Evaluation de la Evaluation des projets Rapport d’évaluation (S3.1) phase pilote (S3) pilotes (S3.1) Soutien donné par Statoo Consulting et METH Soutien «On the job» Plan de soutien Soutien spécifique du personnel faisant partie des projets pilotes Plan de soutien «On the job» (C1) est prêt (C1.1) (C1.2) (C1.1 – C1.2) Compétences (C) Soutien donné par Statoo Consulting et METH Intégration dans la formation officielle EducStat Formation «Off the Soutien des membres du comité de direction concernant les Concept de formation «Off the Conception d’un module de formation (C2.2) job» (C2) approches relatives à la science des données. (C2.1) job» (C2.1 – C2.2) Evaluation de la Définition de la stratégie 2020-2023 (S1.3) Rapport d’évaluation (C3.1) phase pilote (C3) Définition de la feuille de route 2020-2023 (S1.4) Inventaire des données consolidées présentent ou pas dans la KD Accès aux données Inventaire de base des sources de Saisie des métadonnées dans Inventaire des sources intenes internes (DI1) données internes. (DI1.1) SMS (DI1.2) dans SMS (DI1.1 – DI1.2) Accès aux données Liste des data brokers Données et infrastructure (DI) Identification du potentiel des data brokers comme «livreurs/fournisseurs» de données (D2.1) externes DI2) intéressants pour l’OFS (DI2.1) Draft des nécessaires Bases légales (DI3) Identification du potentiel des data brokers comme «livreurs/fournisseurs» de données (D2.1) modifications des bases légales (DI3.1) Infrastructure Open Source A intégrer dans la SIP 2020-2023 Plateforme Mise en place de l’infrastructure Besoins métiers Infrastructure pilote + besoin informatique (DI4) pilot (DI4.1) formulés (DI4.2) métiers pour SIP (DI4.1 – DI4.2) Evaluation de la Evaluation de l’infrastructure Rapport d’évaluation (DI5.1) phase pilote (DI5) pilote (DI5.1) Communication Reporting, workshops et articles Communication interne (CO1) (CO1.1 – CO1.10) (CO) Présentation de la stratégie dans la domaine public 1er Datathon OFS 2019 Articles scientifiques Communication Préparation de la présentation Mobilisation de la communauté Communication concernant les 1er Datathon OFS 2019, Conception Datathon 2019 (CO2.1) externe (CO2) lors des JSS 2017 (CO1.1) (CO2.2) projets pilotes (CO3.1) communications (CO2.1 – 2.8) GL-Antrag GL-Reporting Kick-off interne Communication publique Assemblée des cadres Journal interne Regiostat Fedestat Sources: Data Innovation Strategy 1.0 : https://www.bfs.admin.ch/bfs/en/home/news/whats-new.gnpdetail.2017-0673.html Pilot Projects : https://www.experimental.bfs.admin.ch/en/ 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 2
Agenda 1. What did we do ? 2. What are our experiences ? 3. What will be the next major step ? 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 3
We have created an ad-hoc agile organization inside FSO - #1 Rules 1. Ad-hoc organization is under the final responsibility of a board member. 2. Members of the ad-hoc organization have been stayed subordinated in their line organization ! 3. Five pilot projects are managed by the ad-hoc 26 people (Statisticians, Methodologists, IT Specialists) organization. The final goal is aged between 26 and 58 years old ! to go into production at the end of 2019. 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 4
1. We have created an ad-hoc agile organization inside FSO - #2 Pilotproject X Methodolo Analysts gists IT Specialists • Classical hierarchy that needed to be forgotten… • High degree of autonomy was and is still required… • Choosing the right people at the beginning was not easy… Source: European Commission, Open Data Maturity in Europe 2016, Figure 32, October 2016 (goo.gl/WPHR54) 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 5
2. We have complemented their skills with off- and on-the-job teaching - #1 Off-the-job (9 x 1 day = 9 days dispatched over 3 months) • R for Data Science • Decision trees • Python for Data Science • Bayesian learning • Selected topics in data preparation/preprocessing • Neural networks • Dimensionality reduction (e.g. PCA) • Deep neural networks (e.g. CNN, • Clustering (e.g. k-means) RNN and LSTM) • General introduction to supervised methods • Model evaluation • Regression (e.g. linear, logistic) • Feature selection • Classification (e.g. k-NN, support vector machines) 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 6
2. We have complemented their skills with off- and on-the-job teaching - #2 On-the-job (over 2 years) Scientific advisors • Professor for Data Science (University of Geneva) and the FSO’s deputy head of methods have accompanied/supervised the five pilot projects and validate the results from the scientific point of view. Team of “Data Scientists” • Key idea is to share/transfer the skills and the knowledge inside the FSO Organization. • Trying to avoid an organizational split between “traditional statisticians” and “modern statisticians or data scientists”. 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 7
Agenda 1. What did we do ? 2. What are our experiences ? 3. What will be the next major step ? 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 8
What are our experiences ? - #1 Hard skills • Math & statistical skill, Technical skill and subject matter can be learned. These skills are not always easy to acquire but it seems not to be the hard part of the game! Up to 2020 we will integrate a standard curriculum “Data Scientist” in our internal teaching program. We will collaborate more with Universities to define the exact content. Soft Skills • Soft skills (e.g. collaboration, creativity, problem solving, communication, story telling) need a change in the mindset of all employees. These skills are not so easy to teach… they primary need to be practiced on a daily basis Up to 2020 we will open a new mandatory curriculum “Agile Organization”. 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 9
What are our experiences ? - #2 Academic curricula • Swiss Universities offer standards curricula in data science. The content of these curricula often does not match with the FSO’s expectations. FSO does not really need “Supermen or Superwomen” that can cover all the production process on his own. National Statistical Institute as employer • Recruiting Data Scientist for a NSI likes FSO is not easy. Salaries are not so attractive and the tasks that we can offer to a young data scientist do not offer the expected variety and freedom that they are looking for. FSO wants to empower its employees and not to replace them with Data Scientist! 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 10
Agenda 1. What did we do ? 2. What are our experiences ? 3. What will be the next major step ? 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 11
What will be the next major step ? Version 2.0 of our Data Innovation Strategy The FSO is currently defining a version 2.0 of its data innovation strategy for the years 2020 - 2023. The creation of an official Data Innovation Lab and a Competence Center for Data Science will (probably) be at the heart of the new strategy. Strategic objective 1: To create a central, cross- departmental and independent "Data Innovation Lab" (DIL) with a focus on data innovation services, i.e. the application of “complementary analytics methods” to data (and not to the type of data source and/or technology), for the FSO (I), the whole Swiss public statistics system (II) and the whole federal administration (III). 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 12
Questions & Answers 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 13
Vous pouvez aussi lire