UQpy v4.1: Uncertainty Quantification with Python

This paper presents the latest improvements introduced in Version 4 of the UQpy, Uncertainty Quantification with Python, library. In the latest version, the code was restructured to conform with the latest Python coding conventions, refactored to simplify previous tightly coupled features, and improve its extensibility and modularity. To improve the robustness of UQpy, software engineering best practices were adopted. A new software development workflow significantly improved collaboration between team members, and continous integration and automated testing ensured the robustness and reliability of software performance. Continuous deployment of UQpy allowed its automated packaging and distribution in system agnostic format via multiple channels, while a Docker image enables the use of the toolbox regardless of operating system limitations.


Motivation and significance
Uncertainty Quantification (UQ) is the science of characterizing, quantifying, managing, and reducing uncertainties in mathematical, computational and physical systems.Depending on the sources of uncertainty, UQ provides a multitude of methodologies to quantify their effects.For instance, given the probability distribution for the inputs to a computational model, forward uncertainty propagation methods aim to estimate the distributions or statistics of resulting quantities of interest.Inverse UQ, on the other hand, aims to infer uncertainties in input quantities given limitations and uncertainties in the observed system response, e.g. for model calibration from experimental data.Numerous related tasks fall under the broad classification of UQ including sensitivity analysis, which aims to quantify the influence of multiple inputs to a system, and reliability analysis which aims to estimate (and sometimes minimize) the probability of failure of the system.
A major challenge in UQ is to reduce the high computational expense associated with many repeated model evaluations.This can be achieved through advances in sampling, development of computationally inexpensive surrogate models (or metamodels), and by leveraging high performance computing.To address these challenges, multiple software packages and libraries have been developed.Some of the most comprehensive libraries for UQ include OpenTurns [5], Korali [6], MUQ [7], UQTk [8], Dakota [9], OpenCossan [10] and UQLab [11].These software are developed in either C, C++ programming languages or Matlab and although many provide bindings to Python (to differing extents), they are not generally suitable for direct extension in Python, which is one of the most widely used languages in the scientific community.
Apart from these general purpose UQ libraries, several packages that target specific applications or with more limited scope are available.In R, the DiceDesign [12] package aids experimental design, while DiceKriging and DiceOptim [13] use Kriging for metamodeling and surrogate-based optimization, respectively.The Matlab code FERUM [16], developed by the Engineering Risk Analysis Group at the Technical University of Munich, serves as a general purpose finite element structural reliability code, while SUMOToolbox [15] is a framework for global surrogate modelling and adaptive sampling.Specifically in Python, several focused libraries have been developed.UncertaintyPy [17] was developed for UQ in computational neuroscience.PyROM framework [18] provides a user-friendly way to implement model reduction techniques.The ChaosPy package provides UQ functionality centered around polynomial chaos expansions.Bayesian calibration algorithms are implemented in SPUX [20] and ABCpy [21] and sensitivity analyses by SALib [22].PyMC [26] provides a simple Python interface that allows its user to create Bayesian models and fit them using Markov Chain Monte Carlo methods.PyGPC [27] library is based on generalized polynomial chaos theory and provides capabilities for uncertainty and sensitivity analysis of computational models.Three of the latest additions are PyApprox [23], which provides wide-ranging functionality, NeuralUQ [24] focused on UQ in neural network models, and Fortuna that provides uncertainty estimates, classification and prediction for production systems.
UQpy aims to provide a comprehensive UQ library with wide-ranging capabilities spanning the areas discussed above, as well as a development environment for creating new UQ methodologies.The UQpy package was originally introduced in [4], where the overall structure of v3 was described.Since then, the authors have reworked the UQpy architecture with the goal to simplify its structure, enhance its extensibility, and make it more robust.The updated architecture of the library rendered it not backwards compatible, as the strategy for construction of classes has changed.Yet porting older solutions to the new structure can be performed in a straightforward manner.This restructuring resulted in the current version we present here, v4.1.
The first task carried out towards v4.1 was to restructure the file system.The previous structure which maintained a single Python file per module had reached size limitations and made it cumbersome for the team members to add new functionalities or update existing ones.In the reorganization, a directory was created for each module, which contains, in a hierarchical structure, subdirectories for specific functionalities, with one file dedicated to each class.Slight modifications were also made to the existing code to ensure compliance with PEP8 by renaming modules, classes, and function signatures.Instead of monolithic classes per functionality, each component was split into a separate class with a dedicated abstract baseclass, where applicable.This choice reduced code complexity, provided a standardized way of extending components, and enabled the construction of the final functionality, using object composition and inheritance.
The second step to improve team collaboration was to deprecate the "branch-per-developer" strategy and move to a feature-based branch structure using the Github Flow.This removed unnecessary redundancies and complications when multiple people are working on related functionalities.At the same time, the workflow is now directly combined with testing automation and Continuous Integration/Continuous Delivery (CI/CD) workflows.Unit tests were implemented throughout the software, achieving code coverage greater than 80%.The CI pipeline includes linting, code quality checks, and automated semantic versioning, while the CD pipeline packages and distributes the code via multiple channels, such as PyPI, conda-forge, and Docker images.This CI/CD pipelines are explained in more detail in Section 3.
The documentation was revamped to reflect the new hierarchical structure of the code, with embedded examples serving as tutorials to quickly familiarize users with the code functionality.Specifically, for each class, a gallery of examples is created using the sphinx-gallery extension [25].The users can now download the examples in both Jupyter notebook and Python format or directly interact with the example in a dedicated Binder environment.Finally, several new functionalities were introduced either by the development team or external collaborations, thus boosting UQpy's capabilities.

Software Architecture
UQpy is a Python-based toolbox that provides a series of computational methodologies and algorithms for wide-ranging UQ problems.The core of UQpy is based on state-of-the-art Python libraries, specifically NumPy [1], which is the most fundamental package supporting array and linear algebra operations, SciPy [2], that provides algorithms for optimization, integration and basic statistics, and scikit-learn [3], which includes various tools for supervised and unsupervised learning.UQpy is split into eleven modules, nine of which address specific tasks in UQ and which will be discussed in detail in the following section.A module that enables necessary simulations in all other modules, called run model, aids in the batch execution of both Python and third-party computational models and includes functionality for parallelization via MPI for high performance computing.Finally, a utilities module contains various functions that are common to multiple modules.

Software Modules
In this section, all existing modules of UQpy will be briefly introduced, with emphasis on software updates compared to v3.The respective UML diagrams are included in the UQpy documentation allowing architecture visualization.

distributions module
The distributions module serves as the basis for most probabilistic operations in UQpy.It is fully compatible with scipy distributions and enables users to create probabilistic distribution objects.Compared to the previous version, the baseclass hierarchy was simplified.An abstract baseclass Distribution serves as the interface for creating all subsequent distributions.Depending on the dimensionality of the distribution this baseclass is further refined into Distribution1D and DistributionND for univariate and multivariate distributions respectively, while the Distrubution1D is futher subclassed into DistributionsContinuous1D and DistributionsDiscrete1D for continuous and discrete random variables.Within this structure, 23 distinct distributions are implemented.A Copula baseclass with two implementations enables users to add dependence between 1D distribution objects.All baseclasses can be easily extended by users to implement any distribution of their choice by simply creating a new child class for the distribution and implementing the requisite methods.

sampling module
This module provides a wide range of methods to draw samples of random variables.The following classes enable Monte Carlo simulation and variance reduction methods: MonteCarloSampling, SimplexSampling, ImportanceSampling, and StratifiedSampling.The StratifiedSampling class has been refactored as a parent class for all stratified sampling approaches with LatinHypercubeSampling, TrueStratifiedSampling, and RefinedStratifiedSampling as child classes, all of which utilize a common Strata class for geometric decomposition of the domain.Markov-Chain Monte Carlo (MCMC) methods are included, with the MCMC abstract baseclass serving as the common interface and 7 different methodologies implemented as subclasses.The latest version includes two new implementations of parallel and sequential tempering MCMC algorithms.Additional MCMC methods can be implemented by the user by simply creating a new subclass with the requisite methods.The module also includes the AdaptiveKriging for adaptive sample generation for Gaussian process surrogate modeling (see Section 2.2.9) using specified (and custom) learning functions.Compared to v3, all learning functions have been extracted as separate classes, with a common LearningFunction baseclass, allowing users to easily create custom implementations.

transformations module
This module contains isoprobabilistic transformations of random variables.Except for updates in naming conventions, this module retained the previous functionality with the Nataf, Correlate, and Decorelate transformations being available.

stochastic process module
This module supports the simulation of univariate, multivariate, and multidimensional Gaussian and non-Gaussian stochastic processes, with the latest addition since v3 being the two-dimensional Karhunen-Loève Expansion in the KarhunenLoeveExpansion2D class.All pre-existing classes of SpectralRepresentation, BispectralRepresentation and KarhunenLoeveExpansion have been updated to conform with PEP8 Python coding standards.

run model module
This module is not directly related to any specific UQ operations, yet it is an integral part of the UQpy software.It lies at its core and supports the execution of either Python or third-party computational models at specified sampling points.
UQpy interfaces Python models directly, by importing and executing the code.On the other hand, UQpy interfaces with third-party software models through ASCII text files to introduce uncertainties in their inputs and uses a standardized scripting format for model execution.In both cases, UQpy supports serial and parallel execution.Parallel execution allows the execution of different samples simultaneously, with options for local and cluster execution.Local parallel execution uses MPI and the mpi4py library to distribute the random samples among tasks that are processed independently.In this case, the model evaluation cannot invoke MPI internally.In cluster enabled parallelization, with the aid of a bash script, a tiling of the jobs can be performed to include both shared and distributed memory parallelism, while enabling the user to work with different HPC schedulers.

dimension reduction module
In the update from v3 to v4.1, the dimension reduction module was rewritten from scratch.The existing DirectPOD and SnapshotPOD methods were reworked to comply with the latest Python coding conventions and the HigherOrderSVD class was added.To support Grassmann manifold projections and operations, a series of classes were added.The GrassmannProjection class serves as the parent for classes that project data arrays onto the manifold, with the SVDprojection subclass currently available.After the data have been projected, operations such as computing the Karcher mean or Frechet variance are available with the aid of the GrassmannOperations class.Interpolation can be performed on the manifold with the GrassmannInterpolation class.Special attention was given to the DiffusionMaps class, where the kernel computation was extracted and delegated to a hierarchy of kernel classes in the utilities modules for broader use in future development of other kernel-based methods.More detail can be found in Section 2.2.11.

inference module
The functionality of the inference module was retained from v3 to v4.1 but restructured.The previous InferenceModel class, which defines the model on which inference is performed, has now been split into three separate classes depending on the specific model type, namely DistributionModel, LogLikelihoodModel and ComputationalModel, all under the revised InferenceModel baseclass.For information theoretic-based model selection using the InformationModelSelection class, the information criteria have been extracted as separate classes, AIC, BIC, AICc, under a new common InformationCriterion baseclass.The remaining functionality of MLE, BayesParameterEstimation, and BayesModelSelection was updated according to the newly adopted coding conventions and, for Bayesian evidence computation, the EvidenceMethod baseclass has been established with the HarmonicMean subclass defined and allowing straightforward implementation of new Bayesian evidence methods as distinct subclasses.

reliability module
Modifications to the reliability module were made to ensure compliance with the latest Python coding guidelines.The first and second-order reliability methods, FORM and SORM, were restructured as subclasses under a common TaylorSeries baseclass to remove code redundancies.The existing SubsetSimulation class was retained and revised to match best practices.

surrogates module
One of the most heavily refactored modules in the latest version is surrogates.Generally, surrogate models are now developed under the abstract Surrogate baseclass.The previously existing Kriging class was removed entirely and is now replaced with the more general GaussianProcessRegression, which includes the functionality to perform regression or interpolation (Kriging).Kernels are extracted as separate classes, with the abstract baseclass Kernel (from the utilities module), serving as an interface.The RBF and Matern kernels have been implemented.For use with GaussianProcessRegression, multiple regression methods are implemented as subclasses under the Regression baseclass.The newest addition to GaussianProcessRegression is the ability to add constraints using the virtual point method.These constraints are implemented under the Constraints baseclass, which makes adding new constraints straightforward by implementing a new subclass with the requisite methods.
The PolynomialChaosExpansion class was rewritten from scratch (now as a subclass of Surrogate) to resolve performance issues.Two new baseclasses were introduced.The Polynomials baseclass defines sets of orthogonal polynomials as subclasses, including the Hermite and Legendre polynomial classes.The PolynomialBasis baseclass establishes a set of subclasses to define the polynomial basis, e.g. using a classical tensor product basis TensorProductBasis, or introducing new ways to reduce the basis computation such as the TotalDegreeBasis and HyperbolicBasis classes.This makes the code easily extensible to include new means of basis construction.All regression methods were united as subclasses under the Regression baseclass, again making it more easily extended for new methods, and the computationally efficient LeastAngleRegression was added.
Lastly, the SROM method was retained and updated to conform with the latest Python software development practices.

sensitivity module
In v3, the sensitivity module only contained the MorrisSensitivity method.This module significantly benefited from the extensibility introduced in UQpy with v4.1.
The Sensitivity abstract baseclass now contains the first major contribution from external collaborators introduced in a set of subclasses that include SobolSensitivity, GeneralizedSobolSensitivity, ChatterjeeSensitivity, and CramerVonMisesSensitivity. Additionally, the updated polynomial chaos expansion code in the surrogates module (see Section 2.2.9), allows the computation of first and total order sensitivity indices with reduced computational cost through the PceSensitivity class, which takes advantage of a fitted PolynomialChaosExpansion object.

utilities module
The new utilities module contains code that may be used in multiple modules.This currently contains two abstract baseclasses, the Kernel baseclass and the Distance baseclass for computing kernels and measures of distance, respectively.Within each baseclass, there are two additional baseclasses for Euclidean and Grassmannian kernels/distances.Several kernels and distances have been added as subclasses and new ones can be easily developed by writing a new subclass with the requisite methods.

Continuous Integration
UQpy v3 [4] was developed using the flexible standards of an academic software, which challenged the ability of the team to collaborate and develop new features using a streamlined workflow.To this end, the latest version was fully restructured to enhance its extensibility, while modern software development practices were introduced to support collaboration and ensure code robustness and quality.The standard of Github Flow was adopted as development strategy.The master branch of the Github repository always contains the latest stable version.A Development branch is now used for merging all newly developed functionality and bug fixes.New versions of the software are released when a pull request is merged from the Development branch to master.For developing new features, a feature-{functionality} branch is created from the latest Development state and merged back once complete.The case is similar for bug fixes, with branches following the bugfix-{bug} naming convention.The aforementioned workflow enables a consistent way of treating new functionality or addressing errors arising during development.
To ensure the code quality of all previously implemented features, the development team enforced unit testing practices.Since the functionality implemented in UQpy is inherently stochastic, and its randomness stems from random number generators, a process of setting the seed to ensure test reproducibility is adopted.All previous functionalities are tested against benchmark problems to achieve a minimum of 80% line coverage.To ensure that the code coverage directive is enforced, Azure Pipelines were used to automatically run all tests and compute coverage when a commit is pushed to the Github repository.The static code analyzer Pylint is also used to enforce coding standards and ensure that no syntax errors are allowed.In addition to these checks, a code quality tool named SonarCloud is used to eliminate code vulnerabilities.This tool is triggered when creating a pull request and automatically detects any code smells, bugs or code duplications introduced, and fails when exceeding a predefined threshold.For a pull request to be acceptable, all test, linting, and code quality must satisfy minimum acceptance criteria and must pass a detailed code review from the code owners.Only then will the additions be merged to Development and subsequently to master branch.
Apart from the Continuous Integration process mentioned above, that ensure the robustness of UQpy, a set of Continuous Deployment (CD) actions are triggered.The first action is to evoke the GitVersion tool, which traverses the Git history of the code and determines the version of the code automatically, as a sequence of numbers v{Major}.{Minor}.{Patch}.Using the computed version, the code is packaged and automatically distributed to Python Packaging Index (PyPI), Github release inside the repository, as well as a Docker image that contains the latest UQpy version.
Finally, a structured logging framework was established -in lieu of print commands triggered by if statements that were previously used to indicate errors or faults -that allows users to select the required level of severity tracked during code execution.Six different levels of severity are available in Python, namely NOTSET, DEBUG, INFO, WARN, ERROR, and CRITICAL, with the ERROR being the default case in UQpy.The users can choose a more verbose setting by opting for the INFO severity level.Logging output is then directed to their sinks of choice e.g.Terminal, Logfile, Http Streams, etc.

Impact
The latest version of UQpy modernizes the software to meet best practices in scientific software development, while also updating and improving functionality.This makes the package easier to use and more robust, broadens the classes of problems that it can solve, and greatly enhances the developement experience.These points are critical to the widespread adoption of UQ in scientific applications.This robust yet friendly Python library is both user-and developer-friendly and provides core functionality to casual users, state-of-the-art methods for advanced users, and a carefully designed environment for developers of UQ methods.With the advent of version 4, we have seen the user-base increase as the library has been adopted by external UQ teams, and have now successfully integrated updates from third-party developers -both of which serve to advance the field of UQ.
To summarize, the entire package has been restructured from a single-file per area to a module hierarchy.Wherever possible, suboptions inside algorithms were extracted using the Strategy design pattern to enhance encapsulation and allow users to select their functionality in a more clear and straightforward manner.Baseclasses are now used throughout the code, which provides interfaces for the implementation of new algorithmic alternatives.To enhance the team collaboration efforts, the already existing version control and Github repository were supported with a CI/CD pipeline that automates software testing and code quality checks to ensure the best scientific output, while each new merge to the master is followed by package releases to PyPI, conda-forge, and Dockerhub image repository.
Compared to the other existing UQ packages, many of which have been listed above, the aim of UQpy is twofold.First of all, we aim to provide an extensive UQ library that addresses the wide-ranging needs of the scientific community.At the same time, we want to provide a toolbox that allows its straightforward extension with new functionalities and its use in real-world UQ applications.The developments outlined here represent significant advancements toward these two objectives.

Conclusions
In this work, the open-source library for uncertainty quantification UQpy and specifically the latest v4.1 was introduced.All changes and updates to the modules of the library were explained in detail, with one of the most significant being the new software development and continuous integration workflow.The latest version enables users and external collaborators to expedite the development of new features using UQpy as a platform.This is proven by the new functionalities introduced from both the development team, as well as external collaborators.

Conflict of Interest
We confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Table 1 :
Code metadata