Space Systems Dependability Industrial Challenges - Jean-Paul Blanquart ASTRIUM Satellites

La page est créée Alexandra Marechal
 
CONTINUER À LIRE
Space Systems Dependability Industrial Challenges - Jean-Paul Blanquart ASTRIUM Satellites
Space Systems Dependability
Industrial Challenges

Jean-Paul Blanquart
ASTRIUM Satellites
Jean-paul.blanquart@astrium.eads.net

ETR - 2013
Toulouse, France, 26-30/8/2013
Space Systems Dependability Industrial Challenges - Jean-Paul Blanquart ASTRIUM Satellites
Outline
 Dependability in space
    Constraints, needs, solutions, achievements

 Dependability (FDIR) process
 Model Based Dependability
    Engineering, Assessment

 Why it may become so complicated?

                       ETR – 2013 – Toulouse, France, 26-30/8/2013
Space Systems Dependability Industrial Challenges - Jean-Paul Blanquart ASTRIUM Satellites
Space systems: Constraints
 Limited mass, power
 Limited ground-board link
 Limited maintenance
 Radiations
 Knowledge, mastering of the environment
 Phased missions, critical parts
 Long lifetime

                    ETR – 2013 – Toulouse, France, 26-30/8/2013
Space Systems Dependability Industrial Challenges - Jean-Paul Blanquart ASTRIUM Satellites
A large variety of dependability needs
 Reliability
      Lifetime (satellites, space probes)
      Continuity of service (launchers, rendez-vous, re-entry)

 Availability
      Instantaneous (satellite, probes critical phases, launchers)
      Average (Mission return)
      Outage duration, frequency (critical or expensive services)

 Maintainability, adaptation (reconfiguration, software, procedures)
 Safety (Launchers, manned flights, rendez-vous, end of life)
 Security

                             ETR – 2013 – Toulouse, France, 26-30/8/2013
Space Systems Dependability Industrial Challenges - Jean-Paul Blanquart ASTRIUM Satellites
Solutions, basic principles
 Selective redundancy, hot or more often cold
 duplex
    Some notable examples of comparison and vote

 Automatic detection and reconfiguration (FDIR)
 Safe mode
 “Favourite” adversaries
    Single point of failures, common cause failures, failure
     propagation
    Unpredicted situations, lack of observability, controllability

                         ETR – 2013 – Toulouse, France, 26-30/8/2013
Space Systems Dependability Industrial Challenges - Jean-Paul Blanquart ASTRIUM Satellites
Classical satellite architecture
                                 typically 1 or 2
                                                                                  Interface payload
                                     Remote                      Other
                                     Computer                    RIU                 PIU    PIU

      TM      SCU A
       TC
                                                              Ire                 Gyp
                                                               s
      TM
               SCU
       TC       B                                                        Mil-STD 1553B bus
                                                      GPS
                        Sun
                       Sensor                 PPU                   PSR                     MW-X
                                sensor4
                                ADE
                       CPS

                                      Ion thrusters
                                                                                  SADM
                                                  Batteries
       2*7 Thrusters                  Mechanisms
                                                                                  Solar
                                                                                  Arrays

                                    ETR – 2013 – Toulouse, France, 26-30/8/2013
Computer failure (cold duplex)
Senseur

                                    Mémoire

                                    contexte
                   Nœud                             Nœud
                                                        (off)

                       autotests     on/off              autotests

            Présence                  MRE
            Terre
                                   commutatio

Actuateur

                                                ETR – 2013 – Toulouse, France, 26-30/8/2013
Launcher (Ariane 5)
            OBC1       inter-computers     OBC2
            (Master)      alarm link       (Backup)

Remote                                                        Remote
 unit 1                                                        unit 1                               Hot-Duplex,
(nominal)                                                      (backup)
                                                                                                    semi-cross-strapped,
                            SdC                                                                     one-shot
                       (Mil. Std. 1553B)
Remote                                                        Remote
 unit n                                                        unit n
(nominal)                                                      (backup)

                                                      Umbilical to launchpad

                       + Safety
                                                      ETR – 2013 – Toulouse, France, 26-30/8/2013
Manned automatic vehicles (projects)
 Communication Busses

                                                       RT             RT
                                                                    MIOP           NAP
   RT      RT      RT      RT      RT      RT

   GNC1            GNC2             GNC3
                                                                                                                       Alimentation
                                                                   SIORP
  BC IPC          BC IPC          BC IPC                                           BAP
                                                       BC    IPC
                                                                                     GNC4
                                                                                                                          USR
                                                IPN
                                                                                                                         RT/OBS

                                                                                Reset / Alimentation      Bfin   TM1                  TM2 BFin   Reset / Alimentation

GNC1 Bus        GNC2 Bus        GNC3 Bus              GNC4 Bus

                                                                                                       OBC 1                              OBC 2
                                                                                                                   BFout1        BFout2
                                                                                Contrôle commande                                         RT/OBS Contexte / Reprise
                                                                                                         BC

                                                                                                                          1553

                                                                   ETR – 2013 – Toulouse, France, 26-30/8/2013
ATV (Automatic Transfer Vehicle)
     ATV CORE                                                                                                           ATV
                                                                                                                       CARGO
RUSSIAN SEGMENT BUSES

                                                                          Propulsion
         FTC 1         FTC 2        FTC 3                                    Drive              Gyros         Earth    Rendez-vous
                                                                          Electronics                        Sensors     Sensors

                       SYSTEM BUSES

            VTC          CMU         CMU            MSU                      S             UHF      Power       GPS       CMU
                                                                            Band                    Distr.

US SEGMENT BUSES                                                                               to CMUs
                                      Equipment                                                                         SENSORS
                               Measurement & Command                                        Sun                        ACTUATORS
                                                                                          Sensors
Launch Pad i/f BUSES

                                            ETR – 2013 – Toulouse, France, 26-30/8/2013
ATV Fault Tolerant Computer
                         Avionics System Bu s A

                        Avionics System Bus B
                           Avionics S ystem Bus C

                                      A vionic s System Bus D

                                                          ALB

                                                           FML

         DPU1   DPU2        DPU3
                                                           AVI       DPU4

                       ETR – 2013 – Toulouse, France, 26-30/8/2013
Indicative values, from public data (up to 2005)
                                                                            No pretention as strongly substantiated statistics
And it works…
 Launchers
                                                                                                                             10%
                                                                                                                                                        Propulsion

                100.0
                                                                                                                       13%                              Command
                 90.0                                                                                                                             39%
                 80.0                                                                                                                                   Structure
 Success rate

                 70.0                                                                                                 3%
                 60.0                                                                                                                                   Power
                 50.0                                                                                                  6%
                 40.0
                 30.0
                                                                                                                                                        Separation
                 20.0
                    1955   1960 1965   1970 1975   1980     1985 1990    1995 2000     2005                                                             Explosion
                                                                                                                                 29%
                                 Launches    10 year mean      Mean (90.7%)

 Satellites                                                                                                                     9%
                          “~10-6/h”
                                2xlifetime, 90%>                                                                            4%              20%         Propulsion

                     But:                                                                                                                              Command

                         Launch: 6-7%                                                                                                                  Mechanical
                         In-orbit installation: 4-5%                                                                 25%
                                                                                                                                                        Power
                                                                                                                                                  20%
                         Early phase: 1.5 10-6/h
                         Life: 0.5 10-6/h                                                                                                              Deployment

                                                                                                                                                        Environment
                                                                                                                                      22%

                                                                        ETR – 2013 – Toulouse, France, 26-30/8/2013
Outline
 Dependability in space
    Constraints, needs, solutions, achievements

 Dependability (FDIR) process
 Model Based Dependability
    Engineering, Assessment

 Why it may become so complicated?

                       ETR – 2013 – Toulouse, France, 26-30/8/2013
Dependability process (FDIR)
                                                                                                                        REQUIREMENTS
                                                                                                                        • One Failure tolerant design
                                                      Events                                                           • R(t) ≥ 0,8
                                                   HW ? SW ? GCC ?
                                                                                                                        • D(t) ≥ 0,99
                                                                                                                        • Autonomy …
                                                                                                                        •…

                               NORMAL MODE                                                         SAFE MODE
                                                                                                                                                               Events 
                                                                                              GS               DRU     Rech_pf_y                  TH/Rech_cu

Events 
                                                                                            Batterie                   Alimentation

                                                                                           TX_B                IF_1553 - CV_DRU_y                   MSI_y

                                                                                    TX_A
                                                                                                                                       1553_cu
                                                                                                                                                                HW ?
 SW ?                                                                               RX_A    DCU_A      TTR_A                                      OBMU

                                                                                                                                                                SW ?
                                                                                                                      PM_x
                                                                                    RX_B    DCU_B      TTR_B                           1553_int

 GCC?                                                                               TX_B
                                                                                                                      IOS_y                         IOT_y
                                                                                           TX_A

                                                                                              MAG_y    CSS_y          MTQ_y           RS_y         TH_pf_y
                                                                                                                                                                GCC?

    FDIR                                                                                                Failures Management
                                                                                                        - DETECTION
    • REQUIREMENTS                                                                                      How ? Which parameters ? Frequency ?...
    • ARCHITECTURE                                                                                      - ISOLATION
                                                       Events                                          Protections, Time to react ?...
    • Modes versus “failures”                       HW ? SW ? GCE ?                                     - RECOVERY
    • HIERARCHY                                                                                         Which actions ? Who executes ? …
    • FD/I/R Management
  GCC: Ground Control Centre

                                             ETR – 2013 – Toulouse, France, 26-30/8/2013
FDIR analysis
                                                                                                                                                                                                                                                                                            RAMS_001                         « Every failure likely to propagate shall be detected in appropriate time in order
  REQUIREMENTS                                                                                                                                                                                                                                                                              to avoid propagation to another reliability block (as defined in teh reliability block diagram) »

                                                                                                                                                                                                                                                                                                                                                             RAMS_051                                                                     « Every failure with criticality 1 shall trigger a safe mode »
                                                                                                                                                                                                                                                                                         FDIR
                                                                      NORMAL MODE                                                                                                                                                                                                                                         SAFE MODE
                                                                                                  T
                                           DTC XS     MVTHI XS
                                                                                REFT 8Hz                              REFT 8Hz

  STRATEGY
                                                                      BAS-VIS                                                       RA-E                 RA                                                                                                                                                      GS                DRU     Rech_pf_y                      TH/Rech_cu
                                           DTC PAN    MVTHI PAN                                                                                                                            OBC                                    FCL
                                                                                                                                                                                          (LEON 3)                                                   PPS
                                                                                                      Equerre I/F                                 OMUX
                                                                                                                                                                                                                                                                                                               Batterie                    Alimentation
                                                                                                                                 8PSK      8PSK          8PSK        8PSK

                                           Dét. IRM
                                                       MEP IRM
                                                                        BEV                                                                                                                                                                                                                                   TX_B                 IF_1553 - CV_DRU_y                       MSI_y
                                                                        BEV
                                            et IRT
                                                        MEP IRT
                                                                                                                                             TMI-CDC
                                                                                                                                                                                    TC-CDC
                                                                                                                                                                                                               TMS-
                                                                                                                                                                                                               CDC                                                                                                                                         1553_cu
                                                                                                                                                                                    FCL
                                                                                                                                                                                                                                                                                                       TX_A
                                                          LPTC
                                                         LPTC          EDR

                                                                                                                                                 COME                                                                             REFT                                                                 RX_A    DCU_A       TTR_A                                          OBMU
                                                                                                                                                                                           RIU
                                                       CTAinst.
                                                       CTA inst.
                                                                                                                                                                                                                                                                                                                                          PM_x
                                                                          MSI
                                                       Corps  noir
                                                        CTA inst.
                                                                                           REFT 8Hz                                                                                                                                                                                                    RX_B    DCU_B       TTR_B                           1553_int
                                                      (calibration)

                                                        Mecalib
                                                                                                                                                                                                                                                                            T
                                                                                                                                                                                                                                                                                                       TX_B
                                                                                                       T                                                                                                                                                                             T
                                                                                                                                                                                                                                                                                                                                          IOS_y                             IOT_y
                                                                                                                                                                                                                                                                                                              TX_A
                                                                                                                                                                                             REFT 32Hz

                                                                                                                                                                                                                      REFT 32Hz

                                                                                                                                                                                                                                         REFT 16Hz

                                                                                                                                                                                                                                                            PPS

                                                                                                                       FCL
                                                                                                                       (Rx)
                                                                                                                                                                                                                                                                             CMG-E
                                                                                                           TBS A/B                Thermistances                                                                                                                             CMG-E
                                                                                                                                                                                                                                                                           CMG-E
                                                                                                       (Rx en red. chaude)            (AD)                                                           FOG-E A                  FOG-E B        SST-E                NAV     CMG-E
                                                                                                                                                                               GS
                                                                                                           Coupleur
                                                                                                             3dB
                                                                                                                                  Réchauffeurs
                                                                                                                                   2 x 32 (AC)                    PCDU         GS              FOG-O                    FOG-O            SST-O
                                                                                                                                                                                                                                                                  ANAV                                           MAG_y     CSS_y          MTQ_y           RS_y             TH_pf_y
                                                                                                                                                                                                                                                                          RMU
                                                                                                        ABS          ABS
                                                                                                                                                                               GS                        SSG
                                                                                                        Nadir       Zenith
                                                                                                                                                                               GS                                                                                        CMG-M
                                                                                                                                                                                                                                  MAC

                                                                                                                                                                                                                                                        MAG
                                                                                                                                                                                                                                                      (3 axes)
                                                                                                                                                                BAT (Li-Ion)                                          Propulsion
                                                                                                                                                                                                                      (2 branches de
                                                                                                                                                                                                                       8 tuyères 1N)

                                                                                                                                                                                                                                                                                          TC
                                                                                                                                                                                                                                                                                                                                                                  Eqt BF FM                     Fonction               Dysfonctionnement               Effet Local                           Effets système                   N   A       Recommandations                 TM                       Maîtrise             G        C              Remarques
                                                                                                                                                                                                                                                                                                                                                                      3    1     1   CTA_BUS_RECH                     c/o réchauffeur         Absence activation d'un     Fonction contrôle thermique erronée                                                      T° ligne i         2= SURVEILLANCE              1        R   SUR C= la surveillance comporte un
                                                                                                                                                                                                                                                                                                                                                                                       -Activer les réchauffeurs du                           réchauffeur                 Poinst localements froids (6 lignes affectées)                                                              LVC#XX : Comparaison mesure-                  seuil haut et un seuil bas
                                                                                                                                                                                                                                                                                                                                                                                     contrôle thermique                                                                   LVC#XX : Comparaison mesure-commande                                                                        commande                                      C= 2 cartes RECH_N et RECH_R
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Procédure LVC : SYS_ARO                                                                                                                                   en redondance froide dans boîtier

                                 F / D / I / R
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Passage en mode SAT_SAF                                                                                     3= SOL                                        PCDU (x2)

  DESIGN
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Disponibilité affectée                                                                                      Possibilité d'utiliser la carte
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      RECH_B
                                                                                                                                                                                                                                                                                                                                                                      3    1     2   CTA_BUS_BUS_RECH                 c/c réchauffeur         MEO protection électrique   Fonction contrôle thermique erronée                                                      T° ligne i, j, k   2= SURVEILLANCE              1        R   SUR C= la surveillance comporte un
                                                                                                                                                                                                                                                                                                                                                                                       -Activer les réchauffeurs du                           PCDU LCL groupe de 6        Poinst localements froids (6 lignes affectées)                                           …                  LVC#49 : Comparaison mesure-                  seuil haut et un seuil bas
                                                                                                                                                                                                                                                                                                                                                                                     contrôle thermique                                       pertes de 6 réchauffeurs    LVC#XX : Comparaison mesure-commande                                                                        commande                                      C= 2 cartes RECH_N et RECH_R
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Procédure LVC : SYS_ARO                                                                                                                                   en redondance froide dans boîtier
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Passage en mode SAT_SAF                                                                                     3= SOL                                        PCDU (x2)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Disponibilité affectée                                                                                      Possibilité d'utiliser la carte
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      RECH_B
                                                                                                                                                                                                                                                                                                                                                                      4    1     1   CT_BUS_TH                        Informations capteurs   Paramètres Thermistances    Aucun effet au permier ordre                                R= Evaluer la possibilité de T° ligne i         1= PROTECTION                     1   R    P   C= ES_A reçoit TH_A, TH_C
                                                                                                                                                                                                                                                                                                                                                                                       -Fournir les TH canal 1 du     erronées (ES_A)         erronés voie A ou voie B    Vote majoritaire                                            poursuivre la mission quelle                    Vote majoritaire (2/3)                         C= ES_B reçoit TH_B
                                                                                                                                                                                                                                                                                                                                                                                     contrôle thermique                                                                                                                               que soit lles lignes TH                                                                        C= le vote majoritaire permet
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      affectées                                                                                      d'éviter une reconfiguration et
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      R= Statuer sur                                                                                 permet de tolérer des pannes
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      programmation du CTA en                                                                        croisées sur différentes lignes
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      mode survie
                                                                                                                                                                                                                                                                                                                                                                      4    1     2   CT_BUS_TH                        Informations capteurs   Paramètres Thermistances    Aucun effet au permier ordre                                R= Evaluer la possibilité de T° ligne i         1= PROTECTION                     1   R    P   C= ES_A reçoit TH_A, TH_C
                                                                                                                                                                                                                                                                                                                                                                                       -Fournir les TH canal 1 du     erronées (ES_B)         erronés voie A              Vote majoritaire                                            poursuivre la mission quelle                    Vote majoritaire (2/3)                         C= ES_B reçoit TH_B
                                                                                                                                                                                                                                                                                                                                                                                     contrôle thermique                                                                                                                               que soit lles lignes TH                                                                        C= le vote majoritaire permet
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      affectées                                                                                      d'éviter une reconfiguration et
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      R= Statuer sur                                                                                 permet de tolérer des pannes
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      programmation du CTA en                                                                        croisées sur différentes lignes
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      mode survie
                                                                                                                                                                                                                                                                                                                                                                      6    1     1   MAGNETOMETRE                   Acquisitions          Mesures champ magnétique        SAT_NOM: Magnétomètre n/u                               X                                MNL: Erreur     2= ERREUR REPORT                     3   R        C= Magnétomètre utilisé
                                                                                                                                                                                                                                                                                                                                                                                       -Mesurer le champ            magnétomètre bloquées inutilisables                   LVC#XX : Sorties MAG bloquées                                                            report          LVC#XX : Sorties MAG                              uniquement en mode ASH (survie
                                                                                                                                                                                                                                                                                                                                                                                     magnétique terrestre sur les 3 Bx, By, Bz (0, …)                                     Erreur report                                                                            ASH:            bloquées                                          et première acquisition)
                                                                                                                                                                                                                                                                                                                                                                                     axes Bx, By, Bz                                                                      ASH : Bx, By, Bz pour calcul loi Bpoint                                                  Reconfiguration                                                   C= Magnétomètre redondé
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          perte des fonctions : Mesures de champ magnétique                                        automatique     ACQUISITION/ASH                                   C= surveillance magnétomètre
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Mode ASH erroné                                                                          SURVIE          2= SYS_ARO                                        entraîne erreur report en MNL
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Non convergence STAB                                                                                     LVC#XX : Sorties MAG                              (information sol uniquement)

RAMS_101           « An electrical protection shall protect the spacecraft from any short-circuit down-stream »                                                                                                                                                                                                                                                                                                                                                           LVC#XX : Sorties MAG bloquées
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Procédure LVC : SYS_ARO
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Pass
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   bloquées

RAMS_151        « In order to detect ASH control failure during stabilization phase, the OBSW shall monitor
the duration of the stabilization phase. Triggering of this surveillance shall lead to ARO.
(URD.AOCS.ASH.FDIR.0100)

  V&V
                                                                                                                                                                                                                                                                                         ETR – 2013 – Toulouse, France, 26-30/8/2013
FDIR lifecycle
System Specification                     Functional                                                                                            FDIR
                                                                                                                                               FDIR REPORT
                                                                                                                                                    REPORT
• Lifetime                               Functional                                                                                            •• FDIR
                                                                                                                                                  FDIR design
                                                                                                                                                       design
• Reliability target                      Analysis
                                          Analysis                                                                                             •• …
                                                                                                                                                  …
• Availability target
• Failure events
• Autonomy
• Failure tolerance level                                 Risk
                                                           Risk                                                            Contingency
                                                                                                                           Contingency
• design rules                                           Analysis
                                                         Analysis                                                           Analysis
                                                                                                                             Analysis
•…
                                                                                                                                          FDIR
                                                                                                                                          FDIR TESTING
                                                                                                                                               TESTING
                                                                                                                                          •• Test
                                                                                                                                             Test Spécifications
                                                                                                                                                  Spécifications
                            FDIR
                            FDIR POLICY
                                 POLICY
                            •• Redundancy
                               Redundancy                                   FMECA_1                                FMECA_4
                            •• Architecture
                                                                            FMECA_1                                FMECA_4
                               Architecture                               (Functional)                          (consolidated)
                            •• FDIR
                               FDIR Management
                                     Management                            (Functional)                          (consolidated)
                            •• Operational
                               Operational Modes
                                            Modes
                            •• Feared
                               Feared Events
                                       Events
                                                                                                            FMECA_3
                                                                                                            FMECA_3                FDIR
                                                                                                                                   FDIR VERIFICATION
                                                                                                                                        VERIFICATION
                                                                                                             (HSIA)
                                                                                                              (HSIA)               •• FDIR
                                                                                                                                      FDIR Report
                                                                                                                                           Report 11
                                        FDIR                                                                                       •• REVIEWS
                                        FDIR DESIGN
                                             DESIGN                                                                                   REVIEWS
                                        •• FDIR
                                           FDIR Policy
                                                 Policy (detailed)
                                                         (detailed)
                                        •• Monitoring
                                           Monitoring                                                FMECA_2
                                                                                                     FMECA_2
                                        •• Recovery
                                           Recovery
                                        •• Requirements
                                                                                                     (Detailed)
                                                                                                      (Detailed)
                                           Requirements

                                                                      Sub’s
                                                                      Sub’s SPECIFICATION
                                                                            SPECIFICATION                  Sub’s
                                                                                                           Sub’s DESIGN
                                                                                                                 DESIGN
                                                                      •• Failure
                                                                         Failure tolerance
                                                                                  tolerance                •• Monitoring
                                                                                                              Monitoring
                                                                      •• …
                                                                         …                                 •• FMECA
                                                                                                              FMECA

                                                 MONITORING
                                                 MONITORING // RECOVERY
                                                               RECOVERY                            SW
                                                                                                   SW DESIGN
                                                                                                      DESIGN
                                                 •• SSS
                                                    SSS // SRS
                                                           SRS (SW)
                                                                (SW) inputs
                                                                      inputs                       •• Design
                                                                                                      Design Report
                                                                                                             Report
                                                 •• System
                                                    System database
                                                            database inputs
                                                                       inputs

                                                             ETR – 2013 – Toulouse, France, 26-30/8/2013
Outline
 Dependability in space
    Constraints, needs, solutions, achievements

 Dependability (FDIR) process
 Model Based Dependability
    Engineering, Assessment

 Why it may become so complicated?

                       ETR – 2013 – Toulouse, France, 26-30/8/2013
Types of models (for dependability)
determined by the objectives
 Quantitative (probabilistic) analysis of RAMS properties
     Modelling the impact of faults/failures on the level of performance
     Markov, RBD, …
 Qualitative analysis of faults/failures propagation
     Modelling the impact and propagation of faults/failures on on the
      architecture (physical, functional, …)
     Structural models, AADL, …
 Assessment (correctness, performance) of FDIR
     Modelling the behaviour of the FDIR
     State machines, temporised automata, model-checking, …
 Soundness, completeness of the safety, dependability arguments
     Modelling the dependability and safety argumentation
     Logical formulas, GSN, …

                         ETR – 2013 – Toulouse, France, 26-30/8/2013
Main types of dependability models
 Behaviour (in presence of faults)
    Needs for an appropriate abstraction of the behaviour
    Behaviour of the system (functional, « dys-functional »)
    Behaviour of the fault processing mechanisms
    Assembly of functional blocks (on, off, failed, …)

 Architecture and fault propagation
    Explicit representation of fault propagation, with an abstraction of
     the behaviour

 Coupling behavioural and structural models
    At least for FDIR mechanisms
    Impact of faults on behaviour
    Impact of reconfigurations on behaviour
                           ETR – 2013 – Toulouse, France, 26-30/8/2013
Quite an old story
                                                                                                                                                                                                                               Fault Trees
    with limitations and solutions
 node myComponent
                                                                                                                                                                                                   Command
                                                                                                                                                                                                      Loss
 flow
                                                                                                                                                                                                    2.8 1e-9
   output : {nominal , failed} : out ;
 state
   status : {nominal , failed} ;                                                                                               Prim. Command Sec. Command
 event                                                                                                                               Loss         Loss
   fail;
 init
   status := nominal ;
 trans
                                                                                                           gonio2 gonio1 gyro1                                                                                                                                  sun1                                                sun2
                                                                                                       Eqt BF FM              Fonction               Dysfonctionnement               Effet Local                           Effets système                   N   A       Recommandations                 TM                       Maîtrise             G        C              Remarques

   status=nominal |− fail −> status := failed ;
                                                                                                       3   1   1   CTA_BUS_RECH                     c/o réchauffeur         Absence activation d'un     Fonction contrôle thermique erronée                                                      T° ligne i         2= SURVEILLANCE              1        R   SUR C= la surveillance comporte un
                                                                                                                     -Activer les réchauffeurs du                           réchauffeur                 Poinst localements froids (6 lignes affectées)                                                              LVC#XX : Comparaison mesure-                  seuil haut et un seuil bas
                                                                                                                   contrôle thermique                                                                   LVC#XX : Comparaison mesure-commande                                                                        commande                                      C= 2 cartes RECH_N et RECH_R
                                                                                                                                                                                                        Procédure LVC : SYS_ARO                                                                                                                                   en redondance froide dans boîtier

                                                                                                            Loss   Loss  Loss                                                                                                                                   Loss                                                Loss
                                                                                                                                                                                                        Passage en mode SAT_SAF                                                                                     3= SOL                                        PCDU (x2)

 assert                                                                                                3   1   2   CTA_BUS_BUS_RECH
                                                                                                                     -Activer les réchauffeurs du
                                                                                                                                                    c/c réchauffeur         MEO protection électrique
                                                                                                                                                                            PCDU LCL groupe de 6
                                                                                                                                                                                                        Disponibilité affectée

                                                                                                                                                                                                        Fonction contrôle thermique erronée
                                                                                                                                                                                                        Poinst localements froids (6 lignes affectées)
                                                                                                                                                                                                                                                                                                 T° ligne i, j, k
                                                                                                                                                                                                                                                                                                 …
                                                                                                                                                                                                                                                                                                                    Possibilité d'utiliser la carte
                                                                                                                                                                                                                                                                                                                    RECH_B
                                                                                                                                                                                                                                                                                                                    2= SURVEILLANCE
                                                                                                                                                                                                                                                                                                                    LVC#49 : Comparaison mesure-
                                                                                                                                                                                                                                                                                                                                                 1        R   SUR C= la surveillance comporte un
                                                                                                                                                                                                                                                                                                                                                                  seuil haut et un seuil bas

   output=status ;                                                                                          1e-4   1e-4  1e-4
                                                                                                                   contrôle thermique                                       pertes de 6 réchauffeurs    LVC#XX : Comparaison mesure-commande
                                                                                                                                                                                                        Procédure LVC : SYS_ARO
                                                                                                                                                                                                        Passage en mode SAT_SAF
                                                                                                                                                                                                        Disponibilité affectée                                  1e-4                                                1e-4
                                                                                                                                                                                                                                                                                                                    commande

                                                                                                                                                                                                                                                                                                                    3= SOL
                                                                                                                                                                                                                                                                                                                    Possibilité d'utiliser la carte
                                                                                                                                                                                                                                                                                                                                                                  C= 2 cartes RECH_N et RECH_R
                                                                                                                                                                                                                                                                                                                                                                  en redondance froide dans boîtier
                                                                                                                                                                                                                                                                                                                                                                  PCDU (x2)

                                                                                                                                                                                                                                                                                                                    RECH_B
                                                                                                       4   1   1   CT_BUS_TH                        Informations capteurs   Paramètres Thermistances    Aucun effet au permier ordre                                R= Evaluer la possibilité de T° ligne i         1= PROTECTION                     1   R    P   C= ES_A reçoit TH_A, TH_C
                                                                                                                     -Fournir les TH canal 1 du     erronées (ES_A)         erronés voie A ou voie B    Vote majoritaire                                            poursuivre la mission quelle                    Vote majoritaire (2/3)                         C= ES_B reçoit TH_B
                                                                                                                   contrôle thermique                                                                                                                               que soit lles lignes TH                                                                        C= le vote majoritaire permet
                                                                                                                                                                                                                                                                    affectées                                                                                      d'éviter une reconfiguration et
                                                                                                                                                                                                                                                                    R= Statuer sur                                                                                 permet de tolérer des pannes
                                                                                                                                                                                                                                                                    programmation du CTA en                                                                        croisées sur différentes lignes
                                                                                                                                                                                                                                                                    mode survie
                                                                                                       4   1   2   CT_BUS_TH                        Informations capteurs   Paramètres Thermistances    Aucun effet au permier ordre                                R= Evaluer la possibilité de T° ligne i         1= PROTECTION                     1   R    P   C= ES_A reçoit TH_A, TH_C
                                                                                                                     -Fournir les TH canal 1 du     erronées (ES_B)         erronés voie A              Vote majoritaire                                            poursuivre la mission quelle                    Vote majoritaire (2/3)                         C= ES_B reçoit TH_B
                                                                                                                   contrôle thermique                                                                                                                               que soit lles lignes TH                                                                        C= le vote majoritaire permet
                                                                                                                                                                                                                                                                    affectées                                                                                      d'éviter une reconfiguration et
                                                                                                                                                                                                                                                                    R= Statuer sur                                                                                 permet de tolérer des pannes
                                                                                                                                                                                                                                                                    programmation du CTA en                                                                        croisées sur différentes lignes
                                                                                                                                                                                                                                                                    mode survie
                                                                                                       6   1   1   MAGNETOMETRE                   Acquisitions          Mesures champ magnétique        SAT_NOM: Magnétomètre n/u                               X                                MNL: Erreur        2= ERREUR REPORT                  3   R        C= Magnétomètre utilisé
                                                                                                                     -Mesurer le champ            magnétomètre bloquées inutilisables                   LVC#XX : Sorties MAG bloquées                                                            report             LVC#XX : Sorties MAG                           uniquement en mode ASH (survie
                                                                                                                   magnétique terrestre sur les 3 Bx, By, Bz (0, …)                                     Erreur report                                                                            ASH:               bloquées                                       et première acquisition)
                                                                                                                   axes Bx, By, Bz                                                                      ASH : Bx, By, Bz pour calcul loi Bpoint                                                  Reconfiguration                                                   C= Magnétomètre redondé
                                                                                                                                                                                                        perte des fonctions : Mesures de champ magnétique                                        automatique        ACQUISITION/ASH                                C= surveillance magnétomètre
                                                                                                                                                                                                        Mode ASH erroné                                                                          SURVIE             2= SYS_ARO                                     entraîne erreur report en MNL
                                                                                                                                                                                                        Non convergence STAB                                                                                        LVC#XX : Sorties MAG                           (information sol uniquement)
                                                                                                                                                                                                        LVC#XX : Sorties MAG bloquées                                                                               bloquées
                                                                                                                                                                                                        Procédure LVC : SYS_ARO
                                                                                                                                                                                                        Pass

                                                  AltaRica
                                                                                                                                                                                          Markov Chains
                                                                                                  Mu

                                                                                                                                                        Mu
                       AADL                                          GSPN                         Mu

Cf. Ana-Elena Rugina PhD, 2007

                                                    ETR – 2013 – Toulouse, France, 26-30/8/2013
Main advantages
 Support to model creation, discussion, validation
 Additional capabilities (property validation, …)
 Easier updates
 Coupling with other models

 Could/should we go further?
    What about correctness of
     fault tolerance mechanisms?

                       ETR – 2013 – Toulouse, France, 26-30/8/2013
Objectives & Constraints

Need: Model the system dynamics in the presence of faults

 Define a modelling technique
    Express faults and failures propagation
    Specify Fault, Detection, Isolation and Recovery mechanisms

 Demonstrate properties on Dependability/FDIR

 Appropriate modelling languages & tools

                         ETR – 2013 – Toulouse, France, 26-30/8/2013
Requirements on modelling techniques (1/2)

   Function

                                                                                    Deployment on resources
  Resource                           Function
                                                                                    Functional dependencies

   Function
                                     Resource                                       Spread FDIR functions
                           Recover                       Detect                     FDIR Hierarchy
  Resource
                                                                                   Functional & Dysfunctional
                                         FDIR                                           time constraints
 Detect       Recover

    FDIR
                        Delegate

                                     ETR – 2013 – Toulouse, France, 26-30/8/2013
Requirements on modelling techniques (2/2)
Design objectives                                                             Validation objectives
• Simple (compositional) modelling method                                     • Demonstration
• Close to the engineering model

     Architecture/Behaviour
     (AADL)
     Deployment on resources
                                         Transformation
     Functional dependencies                                                      Timed automata
                                                                                      (UPPAAL)
     Spread FDIR functions
     FDIR Hierarchy
     Time constraints

                                ETR – 2013 – Toulouse, France, 26-30/8/2013
Modelling method – Dependability pattern
                                FDIR MODELS
  Transient/
  Permanent
FAULT
  event MODELS                                      Confirmation

                                   idle                                                 recovery              Fault
    Fault

                FAULT
        alteration         PROPAGATION
                             detection MODELS                                       recovery

                                                                                                   firing
                 Degradation                  Error                                                          Failure

                      Nominal
                                      Recovery
                       mode                                              Failure
                                                                          Failure
                                                                           Failure
                                                                         mode
                                                                                                            FAILURE
                                                                          mode
            Nominal             Nominal
                                                                           mode                             MODELS
             mode                mode                                Recovery
                                               Degraded
                      Nominal                   mode
                       mode

                                          ETR – 2013 – Toulouse, France, 26-30/8/2013
Modelling method
                                                                                                        FDIR
F4 :Data1 absent
=> {Data/ absent/…}
                                                                       f4         fm4   F4         f5   fm5    F5

f4: => {resource/processor/computing/…}

                                                                      F12
                                                                     F11                                       F3
F1 : total loss of computing power
                                                                     fm2                     F0                fm3
=> {resource/processor/computing/…}
                                                                                             fm0
                                                                       f1                                      f3

                                                                                             f0
f1 : CPU internal fault
=> {resource/ processor/ computing/…}

                                          ETR – 2013 – Toulouse, France, 26-30/8/2013
Model transformation
                   
 AADL metamodel      AADL behavior annex
    (.ecore)          metamodel (.ecore)

                                                           AADL model (.xmi)

                                                                                  Kermeta program
  Transformation
                             Transformation                                    Generation
       rules

                                                                                         UPPAAL
UPPAAL metamodel                                                        configuration
                                                          UPPAAL model (.xmi)
    (.ecore)                                                                              (.xml)

                         ETR – 2013 – Toulouse, France, 26-30/8/2013
Model reduction – fault isolation
    FDIR                                                                       F61

                 F41                     f61                      fm6        f62              F51

    FDIR                     fm4                                                                          fm5
 
                     f41     f42     f43                                                           f51    f52      f53

    F01      F02       F03         F11         F12            F13                                F21       F31       F32

             fm0                              fm1                                                                    fm3
                                                                                                 fm2

              f0                                f1                                                f2                     f3
      Device 1                           Device 2                                              Device 4         Device 5
                 
                                           ETR – 2013 – Toulouse, France, 26-30/8/2013
Model reduction – Step by step validation
                   FDIR                         FDIR                               recovery

      f1            fm1             F1                            f2                   fm2                 F2
   {active}     {Nominal}          {idle}                      {active}             {Nominal}             {idle}        T0 = 0

  Find the earliest Find
                    t suchthe earliest t1 such Itthat
                            that                   is true at t
        f1         fm1             F1                                         f2                 fm2               F2   T0 = t

     {failed}   {failed}        {failed}                                 {active}             {Nominal}       {idle}
     {failed}   {failed}        {failed}                                 {failed}             {failed}       {failed}

           Slowest recovery time(t2) < Fastest propagation time(t1)

                            Find the latest t2 such that
        f1          fm1                                  FDIR                          FDIR
                            
     {failed}    {failed}                        {confirmed}                        {recovery}

                                    ETR – 2013 – Toulouse, France, 26-30/8/2013
To summarize…
 Systematic modelling method for dependability and FDIR
      Compositional (dependability pattern)
      Incremental (functional, dependability and FDIR layers)
      Extensible (enhancement of deduced propagation paths)
      Demonstrable (model simulation and model checking)
      Complex propagation, common mode faults, external events, …
 Method improvements
    Model reduction techniques, possibly resolution techniques
    Complete libraries (fault models, failure modes, propagation rules)
 Integration into company’s processes
    Articulation of FDIR, Dependability, Engineering processes
    Place of simulation, formal validation
    Tools, methods benchmarking, training

                             ETR – 2013 – Toulouse, France, 26-30/8/2013
Outline
 Dependability in space
    Constraints, needs, solutions, achievements

 Dependability (FDIR) process
 Model Based Dependability
    Engineering, Assessment

 Why it may become so complicated?

                       ETR – 2013 – Toulouse, France, 26-30/8/2013
Hybrid models
 Structure / Behaviour
 Behaviour: Discrete logic + continuous
    Decoupling possible but based on a priori hypotheses which
     must be validated, cannot be exhaustive, hardly systematic
     and anyway limited by the actual interactions

    Interest of coupling, integrating models
         Limitations of current tools and even theoretical
          framework to support comprehensive and accurate
          simulations… and even more proofs

                       ETR – 2013 – Toulouse, France, 26-30/8/2013
Why we need at least time(s)
 Explicitly used by FDIR mechanisms (implicit
 coupling, enforcement of the desired organisation,
 sequencing, hierarchy)

 Sort of “pivot” notion between the discrete logics
 and the continuous physical phenomena

 Explicit incorporation of the temporal
 characteristics of failures
    Transient, intermittent…
    Possibly some other interesting though speculative ideas
     about time and failures

                       ETR – 2013 – Toulouse, France, 26-30/8/2013
Time and failures
 Notion of resistance, during some time, to faults
     Explicit property in security
     Less usual though possibly interesting for accidental faults
     See also the built-in resistance to exceptional environmental
      conditions

 Notion of “trajectory of failures”
     “Mortal Byzantine”, cf. Josef Widder, Martin Biely, Günther
      Griddling, Bettina Weiss, TU Wien, DSN 2007

 Notion of safety margin for dynamic systems
     Recent and on-going PhD studies (Amina Mekki-Mokhtar,
      Mathilde Machin, LAAS-CNRS, Toulouse, Supervised by D.
      Powell, J. Guiochet)

                         ETR – 2013 – Toulouse, France, 26-30/8/2013
It is a long way to space
Factory,
      Road…

                                                         No source of failure
                                                        should be overlooked

                ETR – 2013 – Toulouse, France, 26-30/8/2013
Space Systems Dependability
Industrial Challenges

Jean-Paul Blanquart
ASTRIUM Satellites
Jean-paul.blanquart@astrium.eads.net

ETR - 2013
Toulouse, France, 26-30/8/2013
Vous pouvez aussi lire