Bayesian Objective Functions for Estimating ... - ACS Publications


Bayesian Objective Functions for Estimating...

0 downloads 123 Views 850KB Size

Subscriber access provided by Lancaster University Library

Process Systems Engineering

Bayesian Objective Functions for Estimating Parameters in Nonlinear Stochastic Differential Equation Models with Limited Data Hadiseh Karimi, and Kimberley B. McAuley Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.8b00293 • Publication Date (Web): 12 Jun 2018 Downloaded from http://pubs.acs.org on June 12, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Bayesian Objective Functions for Estimating Parameters in Nonlinear Stochastic Differential Equation Models with Limited Data Hadiseh Karimi,a Kimberley B. McAuleyb*

*

Corresponding author: Tel.: (613) 533-6000 ext. 77937. E-mail: [email protected]

1 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

a,b

Department of Chemical Engineering, Queen’s University, Kingston, K7L 3N6, Canada

ABSTRACT

An Approximate Bayesian Expectation Maximization (ABEM) methodology and a Laplace Approximation Bayesian (LAB) methodology are developed for estimating parameters in nonlinear stochastic differential equation (SDE) models of chemical processes. These new methodologies are more powerful than previous maximum-likelihood methodologies for SDEs because they enable modelers to account for prior information about unknown parameters and initial conditions. The ABEM methodology is suitable for situations where the modeler can assume that measurement noise variances are well known, whereas LAB includes measurement noise variances among the parameters that require estimation. Both techniques estimate the magnitude of stochastic terms included in the differential equations to account for model mismatch and unknown process disturbances. The proposed ABEM and LAB methodologies are illustrated using a nonlinear continuous stirred tank reactor (CSTR) case study, with simulated data sets generated using a variety of scenarios. The ABEM and LAB objective functions used in the case study result in improved estimates of model parameters and noise parameters compared to previous maximumlikelihood objective functions, especially in situations where data available for parameter estimation are sparse. Because the proposed ABEM and LAB methodologies rely on B-spline basis functions rather than Markov Chain Monte Carlo techniques, they are straightforward to implement using available optimizers and modeling software and require only modest computational effort.

Keywords: Parameter estimation, stochastic modeling, likelihood, Bayesian, prior distribution, nonlinear differential equation

2 ACS Paragon Plus Environment

Page 2 of 58

Page 3 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

INTRODUCTION Continuous nonlinear dynamic models derived based on mass and energy balances are widely used to describe chemical processes. However, these models can provide imperfect representations of process behavior due to simplifying assumptions, poor parameter estimates and unknown/unmodeled disturbances that influence process behavior.1,2 To account for the effects of unmeasured disturbances and model imperfections, stochastic terms are sometimes added to the right-hand sides of ordinary differential equations (ODEs). The resulting stochastic differential equations (SDEs) can provide improved model predictions.1,2 SDEs have been used for simulation, design and optimization of chemical processes and for model predictive control.1–7 Parameter estimation in SDE models is complicated by the two types of uncertainties that are encountered: i) measurement uncertainty and ii) model uncertainty related to the stochastic terms in the SDEs. In some situations, modelers may assume that measurement uncertainties (i.e., measurement noise variances) and the magnitudes of stochastic terms (i.e., disturbance intensities, also called the power spectral density) are known. In many cases, however, the magnitudes of one or both types of uncertainty are estimated along with the model parameters.1–5 A short review of the parameter estimation techniques in ODE and SDE models is presented below. Parameters in ODEs are commonly estimated based on a constrained nonlinear optimization problem that requires iterative numerical solution of the ODEs .8–11 The optimization algorithms is based on leastsquares regression, which minimizes the sum of squared differences between the measurements and the model predictions. Several least-squares algorithms have been developed to improve parameter estimates and to reduce computational effort in ODE models.10,12–15 Robust and efficient methods that simultaneously solve ODEs and estimate the parameters have also been developed.16–18 Collocation discretization techniques have been developed to approximate the solution of the ODEs. In collocation3 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

based techniques, the solution of the ODEs are expressed as a linear combination of basis functions.18,19 The coefficients of the basis functions are determined by forcing the response trajectories ( i.e., the measured states, y) obtained from basis functions to satisfy the ODEs. The representations of the states of ODEs using basis functions transform the ODEs into algebraic equations and the ODE constraints become algebraic constraints. The optimization algorithm simultaneously finds the basis-function coefficients and the optimal parameter estimates. Thus, collocation-based methods require the solution of a large-scale nonlinear minimization problem. Fortunately, current advanced optimization algorithms such as interior point methods can provide a fast and efficient solution to these large nonlinear optimization problems quickly and efficiently.20 Numerous types of basis functions have been used for parameter estimation in ODEs.19 For example, Lagrange polynomials were used by Biegler because they facilitate providing bounds and starting points for coefficients.18 B-spline functions were selected since they are non-zero only over a finite interval and they provide compact support for the empirical curve.21,22 The compact support property leads to bounded matrices that are numerically well-functioned and stable for smoothing and inversing.23–25 Traditional nonlinear least-squares and collocation-based methods enforce the model equations (in ODE and algebraic form, respectively) as hard constraints, consistent with the assumption that the structure of the model is perfect. When it is not appropriate to assume that the model structure is perfect (e.g., when the stochastic terms are included in the model to account for the model mismatch and unmeasured disturbances), implementing the model equations as hard constraints results in biased or inconsistent parameter estimates.26 Thus, using these methods for parameter estimation in SDE models may lead to poor estimation results.27 Numerous techniques are developed for parameter estimation in SDE models.28 A popular approach is Kalman-filter algorithm that uses the time-varying covariance matrix for states. When the discretized SDE model is linear in the parameters, the time-varying state covariance matrix can be easily defined by 4 ACS Paragon Plus Environment

Page 4 of 58

Page 5 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

system of ODEs.29 However, when the model is nonlinear, the state covariance matrix becomes a set of partial differential equations that is difficult to solve. To use Kalman filter in nonlinear models, extended Kalman filters (EKFs) are used. In an EKF, the nonlinear SDE model is discretized and linearized and then treated as a linear parameter estimation problem.29 If the model is highly nonlinear in the parameters and in the input and output behavior, linearization-based methods may lead to biased parameter and state estimates. In highly nonlinear cases, it is recommended to estimate the state covariance matrix based on particle filtering techniques.30 These methods, however, remain computationally intensive.3,31,32 Other common techniques for parameter estimation in SDE models rely on maximum likelihood estimation (MLE) methods because of their desirable MLE properties, which include consistency and asymptotic efficiency.33,34 MLE methods become minimum variance unbiased estimators as the sample size increases. In MLE methods, a likelihood function (i.e., the probability density function of measurements given the parameters) is maximized to estimate the unknown parameters.33 In nonlinear SDE models with unmeasured states, evaluation of the likelihood function is a major challenge because the likelihood function does not have a closed form (i.e., an analytical expression).33 Several techniques have been proposed for approximating the likelihood function, including: i) Kalman-based linearization techniques,29 ii) Markov Chain Monte Carlo (MCMC) techniques also known as particle filter methods,3,4,41,31,32,35–40 iii) techniques that use polynomial chaos basis functions,42,43,52–56,44–51 and iv) techniques that use B-spline basis functions.27,57–64 Benefits and drawbacks of some of these approximation techniques are summarized by Lindstrom.65 Particle filter methods are currently the most popular techniques for approximating the likelihood function because they provide asymptotically unbiased parameter estimates and they do not require assumptions about the form of the density functions.4,39,41,66 A major drawback of particle filter methods, compared to polynomial chaos and B-spline methods, is that they are computationally demanding, especially for models with a relatively large number of states and parameters.4,31,32 Imtiaz et al.41 and 5 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Kantas et al.39 presented an overview of a variety of particle filter methods and discussed their advantages and disadvantages. Table 1 provides a summary of the SDE parameter estimation studies that have used basis-function methods, focusing on either single nonlinear SDEs or systems of SDEs. The methods developed in these studies all assume that the random measurement errors and stochastic disturbances are Gaussian. In the SDE models of mechanical systems studied by Pence,53,54 Blanchard,42 and Madankan,51 all parameters were assumed to be stochastic in nature and were represented using polynomial chaos functions. However, in the chemical process models and financial models considered in the other studies in Table 1, the unknown model parameters requiring estimation are deterministic, with additive stochastic terms appearing in the SDEs to account for model uncertainty and unknown disturbances. All research groups used either polynomial chaos or B-spline basis functions to develop closed-form approximations for the likelihood function, leading to a variety of different objective functions for parameter estimation. Information about these objective functions is summarized in the 6th column of Table 1. The complexity of the objective functions that is appropriate for parameter estimation depends on whether measurement noise variances and disturbance intensities are assumed to be known or unknown (and requires estimation along with the model parameters or whether the modeler has prior knowledge about some of the parameters). Mandur and Budman used Polynomial Chaos for parameter estimation and showed that their Bayesian representation of the parametric uncertainty provides more accurate parameter estimates than those obtained from a normal representation of parametric uncertainty.67 In two studies by Varziri et al. (second row of Table 1), both the measurement noise variances and the process disturbance intensities were assumed to be known, even though the authors acknowledged that prior information about disturbance intensities would generally not be available to a model developer.57,59 The objective functions proposed by Aït-Sahalia,68 Madankan51 and Pence53,54 have only a Weighted Least Squares (WLS) term involving deviations between the measured outputs and model predictions 6 ACS Paragon Plus Environment

Page 6 of 58

Page 7 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

because of simplifying assumptions. For example, Pence53,54 assumed that the only uncertainty in their model is only due to the unknown parameters (i.e, no additive stochastic terms appear in Pence’s SDE models). The objective functions developed by Varziri et al.57,59 have a WLS term and a Model-based penalty (MP) term. This MP term, which arises from the B-spline approximation for the likelihood function, is a measure of how well the empirical basis functions satisfy the corresponding ODE model when stochastic terms are neglected.59 To address the problem of unknown disturbance intensities, Varziri et al. developed a somewhat arbitrary additional objective function that attempts to match the known measurement noise variance with its estimate, using a two-step approach.27 More recently, our research group used an Expectation Maximization (EM) approach to develop an overall objective function that provides estimates of disturbance intensities and model parameters, which are more accurate than those obtained using Varziri’s approach.62

7 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 8 of 58

Table 1. Objective functions for estimating parameters in SDEs using basis functions

Model Type

Parameter Basis Disturbance Measurement Noise Objective Type Functions Intensity Variance Structure

Function References 68

Single Nonlinear SDE

D, S

CP

U

K

WLS

Nonlinear SDEs

D, S

B

K

K

WLS+MP

57,59

Nonlinear SDEs

D, S

B

U

K

WLS+MP

57,58

Nonlinear SDEs

S

CP

U

K

WLS+PP

42

Nonlinear SDEs

S

CP

U

K

WLS

53

Single Nonlinear SDE

S

CP

U

K

WLS

51

Nonlinear SDEs

D, S

B

U

U

WLS+MP+DP+NVP+HP

60

Nonlinear SDEs

S

CP

U

K

WLS

54

Nonlinear SDEs

D, S

B

U

K

WLS+MP+DP

62

Nonlinear SDEs

D, S

B

U

U

WLS+MP+DP+NVP+HP

61

Nonlinear SDEs

D, S

B

U

K

WLS+MP+DP+PP

63

Nonlinear SDEs

D, S

B

U

U

WLS+MP+DP+NVP+HP+PP

64

S: Stochastic, D:Deterministic, K:Known, U:Unknown, CP: Chaos polynomials, B:B-splines, WLS: Weighted Least Squares, PP: Parameter penalty to incorporate prior information, MP: Model-based penalty, HP: Hessian Penalty, DP: Disturbance Penalty, NVP: Noise Variance Penalty 8

ACS Paragon Plus Environment

Page 9 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

As shown in 9th row of Table 1, this EM objective function contains a disturbance penalty (DP) term, which will be described in the Preliminaries. Additional objective functions have been derived to account for situations where the measurement noise variance is also unknown (e.g., shown in rows 7 and 10 in Table 1 ).60,61 These objective functions contain noise variance penalty (NVP) terms and also a Hessianmatrix penalty (HP) term, making the objective functions suitable for simultaneous estimation of deterministic model parameters, process disturbance intensities and measurement noise variances. An approximate Kalman filter-based method was proposed by Kristensen et al.69 In Kristensen’s method, the mean and variance of the likelihood function are approximated using an Extended Kalman Filter (EKF). They also developed a software called CTSM (continuous-time stochastic modeling) for parameter estimation based on their methods. Previously, we showed that the MLE objective functions developed in our research group (objective functions corresponds to rows 7 and 9 in Table 1) provide more accurate and robust parameter estimates than CTSM, particularly when the number of measurements is relatively small or when initial guesses of parameters are relatively poor.70 Using prior knowledge available to the modeler, Bayesian estimation methods can provide less-biased parameter estimates compared to MLE methods, especially when available datasets are small.56,71 Blanchard’s42 objective function (row 4 of Table 1) and the two most recent objective functions developed by our research group63,64 ( rows 11 and 12 of Table1) also contain Bayesian parameter penalty (PP) terms that can account for prior information about deterministic parameters, poorly known initial conditions, and/or noise variance). These terms are similar to PP terms in objective functions developed by Box and Draper72 who estimated parameters in algebraic equation models. When using basis functions for approximating the likelihood function, differential equations become algebraic expressions, because the basis function expressions can be readily differentiated with respect to time. As a result, there is no need for numerical solution of differential equations when using basis functions to approximate the state trajectories.27 Use of B-splines as collation functions for simultaneous 9 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

parameter estimation and solution of ODEs is reminiscent of the work of Biegler.18 The main difference between the Biegler approach and the methodology introduce in the current paper is that Biegler and coworkers estimate parameters in ODE models. Whereas the current paper focuses on the more difficult case of parameter estimation in SDE models that account for model mismatch and stochastic process disturbances. In this paper, complete derivations of the two proposed Bayesian objective functions summarized in Table 1 (rows 11 and 12) are presented and their effectiveness is tested using a two-state continuousstirred-tank reactor (CSTR) simulation study. We explore advantages of these novel objective functions compared to the existing objective functions for parameter estimation in SDE models. These advantages include improved parameter estimates with less bias and variability, compared to related spline-based SDE parameter estimation techniques. Our proposed Bayesian algorithms can provide accurate parameter estimates when the size of datasets are small and they can be more easily implemented. Furthermore, we anticipate that the proposed algorithms will be much more computationally attractive than particle-filter methods which are the current state of the art methods for parameter estimation in SDEs. Additionally, the proposed objective functions should be attractive to practitioners who develop dynamic models for online monitoring and control using the proposed objective functions. We anticipate that they will be able to easily adopt the proposed objective function structure to their modeling and parameter estimation problems without needing to develop detailed expertise in SDE theories. First, we introduce necessary notation and show derivations for the two proposed Bayesian objective functions for estimating deterministic parameters and stochastic parameters in SDE models. We then demonstrate the effectiveness of the proposed SDE parameter estimation methodologies using CSTR simulation studies with limited data, where concentration measurements are available at a lower sampling frequency than temperature measurements. Benefits of including prior information about reasonable 10 ACS Paragon Plus Environment

Page 10 of 58

Page 11 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

initial guesses for unknown parameters and unknown noise variances are illustrated and advice is provided regarding which objective function should be used for parameter estimation in different situations. PRELIMINARIES Model and Notation The dynamics of many chemical and biochemical processes can be represented by Multi-Input MultiOutput (MIMO) nonlinear SDE models of the form:

x& ( t ) = f ( x (t ), u (t ), θ ) + η ( t )

(1.a)

x(t0 ) = x0

(1.b)

y(tm r, j ) = g(x(tm r, j ), u(tm r, j ),θ) + ε(tm r, j )

(1.c)

where x ∈ R X is the vector of state variables, t is time, f : R X × R U × R P → R X is a vector of U nonlinear functions, u ∈ R is the vector of input variables and θ ∈ R P is the vector of unknown model

parameters, η(t ) ∈ R X is a continuous zero-mean stationary white-noise process with covariance matrix E{η(t1)ηT(t2)}= Qδ(t2-t1), where Q is a diagonal power spectral density matrix with dimension X ×X :

Q1 L 0  Q = M O M  0 L QX 

(2)

δ(.) is the Dirac delta function.6,29 The diagonal elements of Q are referred to as disturbance intensities Y (i.e., Qd=[Q1,…,QX]T). In Eq. (1.c), y ∈ R is vector of measured output variables. x0 ∈ R is a vector of X

initial conditions for the state variables. Some of these initial conditions may be known to the modeler and others may be unknown values that require estimation along with the model parameters. tm,r,j (j = 1…Nr) are measurement times for the rth response (r=1…Y) and Nr is the number of measurements for rth response variable. g ∈ R is a vector of nonlinear functions and ε ∈ R Y is the zero-mean random Y

11 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

measurement error. Assume that errors in measurements made at sampling time tm

Page 12 of 58

r,j

(j = 1…Nr) are

independent so that their variance-covariance matrix is: σ12 L 0    Σ = M O M  0 L σ 2  Y 

(3)

For simplicity of presentation, the objective functions in this paper are developed assuming that the variance-covariance matrixes of measurement noise Σ and disturbance intensities Q are diagonal. However, the proposed objective functions can also be used for cases where Σ and Q are not diagonal since the derivations of our objective functions do not require matrixes Σ and Q to be diagonal. We have not performed any simulations to test the accuracy of estimates of off-diagonal variance and disturbance intensity parameters. Because we anticipate that many modelers will choose to make the diagonally assumption. The best representation of uncertainty for different cases should be chosen carefully by the modeler. The modeling error (uncertainty) can be described by either uncertainty in parameters or an additive disturbance in the state equation. These disturbances can be either Gaussian or non-Gaussian. In this work, the modelling error is described by an additive disturbance in the state equation (see Equation (1.a)). Consider Ym = [ y1 (t m 1,1 )K y1 (t m 1, N1 )K yY (t m Y ,1 )K yY (t m Y , NY )]T a vector that contains all of the measured values. The corresponding vector of state values at the measurement times is

Xm = [ x1 (t m 1,1 )K x1 (t m 1, N1 )K xY (t m Y ,1 )K xY (t m Y , NY )]T . Um and εm are corresponding vectors for the input variables and random errors:

Ym = g( Xm , U m , θ) + ε m

(4)

To define the vector of unknown parameters, denote Σd as the diagonal elements of the covariance 2

2

matrix (i.e., Σd=[ σ1 ,…, σY ]T). Also denote Qd as the vector of diagonal elements of Q (i.e., 12 ACS Paragon Plus Environment

Page 13 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Qd=[Q1,…,QX]T) and denote ζ = [θ , x0u , Qd , Σd ] as the vector of unknown parameters in the model, T

T

T

T T

where x0uis a vector of the unknown initial conditions. In this paper, continuous time models are used instead of discrete time models to represent stochastic disturbances because chemical engineering models are usually developed and appear in continuous form.73 When deriving the algorithms in this paper, we discretized SDE in Eq. (1) using an Euler scheme: x(t i −1 + ∆t ) = x(t i ) = x(t i −1 ) + f (x(t i −1 ), u(t i −1 ), θ)∆t + ηd (t i −1 )∆t

(5)

x(t 0 ) = x 0

(6)

where x(ti) is a vector of state values at q uniformly-spaced time points ti , i=0,..,q and ηd is a discrete white noise vector. Consider X q = [x T (t 0 )K x T (t q )]T as the stacked vector of state values at the discrete times where the random errors are imagined to enter the process. The stochastic disturbances η(t) is implemented using discrete-time white-noise sequences that are a sequences of random step functions with a sampling interval ∆t and covariance:27,29 Q  E{ η d ( j1 ∆t ) η d ( j 2 ∆t )} =  ∆t 0

j1 = j 2

(7)

j1 ≠ j 2

where j1 and j2 are positive integers. Large values of the diagonal elements of Q correspond to large disturbances or considerable model mismatch. Please note that the Euler discretization of the SDE model (Eq. (7)) is only an intermediate step for deriving the proposed objective functions. In the final step, after developing a closed form for the likelihood function, the objective functions are converted back to the continuous form by taking the limit as ∆t approaches to zero.61,62 Rather than showing all of these steps in the current article, we refer the interested reader to our previous paper where non-Bayesian objective functions where derived. 13 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 58

The existence of a solution of an SDE is ensured when globally Lipschitz, linear growth and boundedness conditions are satisfied. These assumptions are described in detail by Liptser74 and Bishwal.75

B-Spline Basis Functions Basis functions can be used to smooth discrete dynamic data and to provide continuous-time empirical models to represent the data.25 B-spline functions are commonly chosen as basis functions because of their compact support and piecewise definitions, which help to avoid numerical problems when fitting the data. 22,24 B-splines basis functions consist of Mth order piecewise polynomials that are positive within M intervals and zero elsewhere.22,76 Throughout this paper the subscript ~ is used to indicate states approximated by B-spline basis functions27,57–64,76 and the states without subscript ~ are not calculated from B-splines and are normal non-smoothed states. The sth state of the multivariate stochastic system in Eq. (1) can be approximated using a linear combination of cs B-spline basis functions:21,25 cs

x~ s (t ) = ∑ β s ,l φs ,l (t )

for s=1,…,X

(8)

l =1

where β s ,l is a the lth B-spline coefficient for the sth state and φs ,l (t ) is the corresponding B-spline basis function. In matrix form Eq. (8) is:

x ~ (t ) = Φ(t )Β

(9)

where Φ(t ) is a matrix of spline functions: φ1T (t ) 0 K 0    φ T2 (t ) K 0  0 Φ (t ) =  M O M  M 0 0 K φ TX (t ) 

(10)

14 ACS Paragon Plus Environment

Page 15 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

and

β 1  Β = M  β X 

(11)

where βs is the vector containing cs B-spline coefficients for the sth state: β s = [ β s ,1 , K , β s , c s ] T

for s=1,…,X

(12)

Maximum Likelihood Notation In MLE methods, the probability density function of measurements given the parameters p(Ym | ζ) ) is maximized to estimate the unknown parameters.33,34

ζˆ = arg max p( Ym | ζ)

(13)

ζ

The likelihood function (of the parameters given the measurements) is defined as: L( ζ | Ym ) = p ( Ym | ζ)

(14a)

The optimal values of the unknown parameters ζ can be found by maximizing the likelihood function:

ζˆ = arg max L( ζ | Ym )

(14b)

ζ

Bayesian Estimation Notation In Bayesian methods, the posterior density function p(ζ | Ym ) is maximized to estimate the unknown parameters:

p(Ym | ζ) p(ζ) ζˆ = arg max p(ζ | Ym ) = p(Ym ) ζ

(15)

15 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 58

where p ( Ym | ζ ) is the probability density function of the measurements given the parameters and

p (ζ ) is the prior distribution of the parameters, which contains the modeler’s knowledge about reasonable values of ζ . The numerator on the right-hand side is the product of the likelihood function and the prior distribution of the parameters. If the number of data points for parameter estimation is large, the posterior density function is dominated by the likelihood function so that Bayesian estimates converge to the ML estimates. But, when the data are limited and/or when precise prior information is provided by the modeller, Bayesian parameter estimates tend to be more accurate than ML estimates. Note that the denominator, p ( Y m ) in Eq. (15) only depends on the data and not on the parameters, so it is constant during parameter estimation and does influence the parameter estimates.

Objective Functions for Estimating Parameters in SDEs Table 2, provides a series of objective functions that rely on B-spline basis functions when estimating parameters in nonlinear SDE models. These objective functions are presented here using the notation outlined above. Details on how B-splines basis functions facilitate approximation of the likelihood function are presented in our previous work.60,61,77

Table 2. Selected objective functions for estimating parameters in SDEs using B-splines Objective Function

Number

J AMLE = [ Ym − g ( X m ~ , U m , θ)]T Σ −1 [ Ym − g ( X m ~ , U m , θ)]

(2.1)

tq

+ ∫ [ x& ~ (t ) − f ( x ~ (t ), u (t ), θ)]T Q −1 [ x& ~ (t ) − f ( x ~ (t ), u (t ), θ )]dt t0

J AEM = [ Ym − g ( X ~ m , U , θ) ]T Σ −1 [ Ym − g ( X ~ m , U m , θ)] tq

+ ∫ [ x& ~ (t ) − f ( x ~ (t ), u (t ), θ )]T Q −1 [ x& ~ (t ) − f ( x ~ (t ), u (t ), θ )]dt + q ln[det( Q )] t0

16 ACS Paragon Plus Environment

(2.2)

Page 17 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

J LAMLE = [ Ym − g ( X ~ m , U m , θ)]T Σ −1 [ Ym − g ( X ~ m , U m , θ)]

(2.3)

tq

+ ∫ [x& ~ (t ) − f ( x ~ (t ), u(t ), θ)]T Q −1 [ x& ~ (t ) − f ( x ~ (t ), u(t ), θ)] d t t0

+ q ln(det(Q)) + n[ln(det( Σ)] + ln(det(H x ~ )) J ABEM = [ Ym − g ( X ~ m , U m , θ)] T Σ −1 [ Ym − g ( X ~ m , U m , θ)]

(2.4)

tq

+ ∫ [x& ~ (t ) − f ( x ~ (t ), u(t ), θ)]T Q −1 [ x& ~ (t ) − f ( x ~ (t ), u(t ), θ)] d t t0

P

(θ i − θ i ) 2

i =1

σ θ2,i

+ q ln[det(Q )] + 2 ln( 2π σ θ ,i ) P + ∑ + 2 ln( 2π σ x , s )

Nu

Nu

( x0u ,s − x s ) 2

s =1

σ x2, s

+∑

J LAB = q ln[det(Q)] + n[det(Σ)] + [Ym − g( X ~ m , U m , θ)]T Σ −1 [Ym − g( X ~ m , U m , θ)]

(2.5)

tq

+ ∫ [x& ~ (t ) − f (x ~ (t ), u(t ), θ)]T Q −1 [x& ~ (t ) − f (x ~ (t ), u(t ), θ)] d t + ln[det(H X ~ )] t0

p

+∑ i =1

(θ i − θ i ) 2

σ

2 0 ,i

Nu

+ + ln(σ x , s ) N u + ∑ s =1

( x0u , s − x s ) 2

σ

2 x,s

+ (n + µ Σ + υ + 1) ln[det(Σ)] + trace(ΩΣ −1

Objective function (2.1) is the Approximate Maximum Likelihood Estimation (AMLE) objective function developed by Varziri et al.59 This objective function was derived by using B-spline basis functions to find a closed-form expression for the probability density function of the complete data given parameters ( p(Xq , Ym | ζ) ). This AMLE objective function is only suitable for estimating parameters in SDEs when the process disturbance intensities and measurement noise variances are known, making it impractical for most situations encountered by modelers of chemical processes. The model-based penalty in objective function (2.1) and the other objective functions in Table 2 (i.e, the second term in the right hand side) prevents the fitted state trajectories from having unrealistic behaviour that is not consistent with the SDE model.

17 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Although modelers often have knowledge about the typical size of measurement errors (from replicate experiments or measurement equipment suppliers), they do not usually have good information about the magnitude of the mismatch in the models that they have developed, prior to estimating the model parameters. Because of this issue, Varziri et al. developed an iterative method to update the values of disturbance intensities by ensuring that estimates of the measurement noise variances are close to the known values assumed by the modelers.27 Concerns about this somewhat arbitrary approach for treating unknown disturbance intensities motivated the development of the Approximate Expectation Maximization (AEM) objective function in Eq. (2.2), which can be used for simultaneous estimation of disturbance intensities along with model parameters.62 The AEM algorithm was derived by approximating the expectation step of the EM algorithm78 using B-splines.62 The AEM objective function requires known measurement variances, which are not always available. Use of objective functions (2.2) to (2.5) requires the modeler to specify ∆t which is the imagined sampling time for discrete white noise disturbances that influence the process behaviour (i.e. q=(tq-t0)/∆t is the number of discrete shocks that are assumed to occur during a dynamic experiment). Objective function (2.3) is called the Laplace Approximation Maximum Likelihood Estimation (LAMLE) objective function because it was derived by using the Laplace approximation to integrate the likelihood function of the complete data p(Xq , Ym | ζ) over the vector of possible values for the state variables Xq.61 This multidimensional integration relies on B-splines to approximate the state trajectories. The LAMLE objective function is an important contribution to the SDE parameter estimation literature because it provides reliable estimates of unknown parameters, disturbance intensities and noise variances with relatively low computational effort. The main difficulty that occurs when implementing LAMLE is that an analytical expression for the Hessian matrix in Eq. (2.3) is difficult to obtain. Thus, we developed an iterative two-step approach that does not require analytical calculation of the Hessian.61 The proposed methods in the current article use a similar iterative approach, described in the next section, which shows 18 ACS Paragon Plus Environment

Page 18 of 58

Page 19 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

the development and implementation of objective functions (2.4) and (2.5). These new objective functions are more flexible that LAMLE because they can account for prior information about parameter values, measurement noise variance and initial conditions for state variables. The Fully-LaplaceApproximation Expectation Maximization (FLAEM) algorithm60 is another algorithm for estimating measurement noise variances along with process disturbance intensities and model parameters in SDE models. FLAEM is an iterative two-step algorithm that was derived using the fully Laplace approximation and B-spline basis functions. In a previous study, we showed that LAMLE parameter estimates were less biased and more accurate than corresponding estimates obtained using FLAEM. Furthermore, the LAMLE method was easier to implement and converged faster than the FLAEM method.61 Because of the superior performance of LAMLE compare to FLAEM, the use of LAMLE is recommended instead of FLAEM and equations related to FLAEM algorithm are not shown in Table 2. Prior information for disturbance intensities is not considered because we cannot envision practical situations where a modeler of a chemical process would have this type of knowledge.

DEVELOPMENT OF PROPOSED BAYESIAN ALGORITHMSD Development of Approximate Bayesian Expectation Maximization (ABEM) Algorithm Appendix A shows the derivation for the Approximate Bayesian Expectation Maximization (ABEM) objective function shown in Eq. (2.4), which can be used for simultaneous estimation of model parameters and process disturbance intensities. The derivation for the ABEM objective function builds on the derivation of our previous AEM objective function,62 with the main difference being that ABEM accounts for the modeler having prior information about reasonable distributions for some or all of the model parameters. The ABEM derivation assumes that the measurement noise variances are perfectly known. In situations where the noise variances are unknown, the modeler should use the LAB (Laplace Approximation Bayesian objective function Eq. (2.5) in Table 2, which is derived in Appendix B). When 19 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 58

using the ABEM methodology, estimates of the SDE parameters (i.e., the deterministic parameters in the DEs, the unknown initial states and the disturbance intensities) are determined by minimizing objective function (2.4) with respect to these parameters and the empirical B-spline parameters B:

ζˆ, Βˆ = arg min J ABEM

(16)

ζ,Β

The differences between the ABEM and AEM objective function are the four last terms in Eq. (2.4), which are related to prior information about the parameters and initial states. In the derivation of Eq. (2.4), Gaussian distributions are assumed for the prior knowledge about the model parameters θ and unknown initial states

x 0u . Since the modeler would usually have no prior

information about the model mismatch and stochastic disturbances accounted for by Q, a uniform distribution (between 0 and ∞) is assigned to describe prior knowledge about the diagonal elements of Q indicating that all positive values are equally likely. As a result no term related to prior information about Q appears in Eq. (2.4).

Development of Laplace Approximation Bayesian (LAB) Algorithm When modelers have no prior knowledge about measurement noise variances and no prior information about model parameters, they should use the LAMLE objective function (Eq. (2.3) in Table 2). The LAB algorithm has been developed for more complicated situations where modelers may have some prior information about model parameters and/or initial conditions and where the measurement noise variance is either poorly known or unknown. Optimal values of the model parameters θ, the unknown initial conditions x0u, the disturbance intensities Q, the noise variances Σ and the B-spline coefficients B can be found by minimizing JLAB:

ˆ = arg min J ζˆ, B LA B

(17)

ζ,B

20 ACS Paragon Plus Environment

Page 21 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

In cases where prior information is not available about some of the parameters (or initial conditions or noise variances), the corresponding terms are eliminated from the objective function. Note that the LAB objective function simplifies to the LAMLE objective function when no prior information is included. In the LAB objective function (2.5), HX~ is the Hessian matrix of − ln p(Xq , Ym | ζ) with respect to Xq evaluated at Xq~: H X~ = −

∂ 2 ln p ( X q , Ym | ζ ) ∂X q ∂X qT

(18) Xq =Xq ~

where Xq~ contains the smoothed state trajectories evaluated using the current values of the B-spline coefficients. The main difficulty when implementing the LAB methodology is that an analytical expression for Hx~ would be difficult to determine. When developing the LAMLE algorithm, we proposed an iterative method for evaluating Hx~ that does not require analytical calculation of this Hessian matrix.61 A similar iterative procedure can be used for the proposed LAB methodology and is shown in Appendixes C and D. In the first step of the proposed algorithm involves minimizing the following objective function to estimate θ and Β, using initial guesses for Q and Σ: 1 J = [Ym − g( X ~ m , U m , θ)]T Σ −1 [Ym − g( X ~ m , U m , θ)] 2

(19)

t

q p (θ i − θ i ) 2 1 + ∫ [x& ~ (t ) − f (x ~ (t ), u(t ), θ)]T Q −1[x& ~ (t ) − f (x ~ (t ), u(t ), θ)] d t + ∑ 2 t0 2σ 02,i i =1

The derivation for Eq. (19) is shown in Appendix C. In the second step, the estimated values of θ and Β from the first step are used to calculate updated values of Q and Σ : tq −1 −1

Q k +1 = ( tr(ΨH Ψ ) Σ ) -1 B

T

∫ [x& (t ) − f (x(t ), u(t ), θ)]

T

[x& (t ) − f (x(t ), u(t ), θ)]dt

t0

(20) k

{

Σ k +1 = ( 2 n + µ Σ + υ + 1) −1 [ Ym − g ( X ~ m , U m , θ) ] [ Ym − g ( X ~ m , U m , θ) ] T + tr ( ΨH B-1 Ψ T ) I + Ω T

21 ACS Paragon Plus Environment

}

k

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 58

(21)

[

]

T

where Ψ = Φ(t1 ) KΦ(t q ) and H B = ∂J ∂B∂B T . Symbols that appear in Eqs. (20) and (21) are defined in Appendix D. Iteration between these two steps (i.e., minimizing Eq. (19) and performing updates using Eqs. (20) and (21)) continues until convergence is obtained. It is common for optimization software to automatically report numerical values of the Hessian matrix with respect to the decision variables (e.g., HB in Eqs. (20) and (21)). Thus, there is no need for analytical calculation of the Hessian matrix Hx~ using the proposed LAB methodology assuming that convergence is obtained.

Illustrative Example: CSTR Model In this section, parameters in a relatively simple two-state nonlinear CSTR model59,79 are estimated to illustrate the application of the proposed ABEM and LAB algorithms. This example with two states, four model parameters and four uncertainty parameters was selected because it is sufficiently rich for the initial testing of the proposed algorithms yet simple enough to be readily understood by readers. Parameter estimation results obtained using ABEM and LAB methods are also compared with those obtained from AEM and LAMLE, respectively, which do not account for prior information about parameters and unknown initial conditions. The two SDEs describing dynamic changes in CA, the concentration of reactant A, and T, the reactor temperature are:

d CA (t) F(t) = (CA0 (t) − CA (t)) − kr CA (t) +ηC (t) dt V

(22a)

d T (t ) F (t ) = (T0 (t ) − T (t )) + UA (T (t ) − Tin (t )) + γ k r C A (t ) + η T (t ) dt V

(22b)

y C ( t m 1, j ) = C A ( t m 1, j ) + ε C (t m 1, j )

(22c)

y T ( t m 1, j ) = T (t m 1, j ) + ε T (t m 1, j )

(22d) 22 ACS Paragon Plus Environment

Page 23 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(

(

))

where kr (T ) = kref exp − E R 1 T −1 Tref , γ is related to the enthalpy of reaction and defined as γ = (−∆H rxn )

ρc p and UA is a heat transfer coefficient that depends on the cooling water flow rate and

(

)

b +1 b heat capacity UA(Fc ) = aFc Vρc p Fc + aFc 2ρ c c pc .

The stochastic terms η C (t ) and η T (t ) in SDEs (22a) and (22b) are continuous Gaussian white noise sequences with unknown power spectral densities QC and QT , respectiveley, defined so that E{η C (t i )η C (t j )} = QC δ (t i − t j ) and E{η T (t i )η T (t j )} = Q T δ (t i − t j ) . In Eqs. (22.c) and (22.d), ε C (t mj )

j = 1KnC and ε T (t mj ) j = 1KnT are Gaussian white noise sequences with variances σ C2 and σ T2 , respectively. Note that the number of available concentration measurements nC in the data set may be considerably smaller than the number of temperature measurements nT depending on the type of sensors used. The model inputs are: the feed flow rate F, the inlet concentration CA0, the inlet temperature T0, the coolant inlet temperature Tcin and the flow rate of coolant to the cooling coil, Fc. The known constants for this CSTR model are shown in Table 3.

Table 3. Values of known constants in CSTR model79

Model

Value

Units

cp

4186.8

J·kg-1·K-1

cpc

4186.8

J·kg-1·K-1

Tref

350

K

V

1.0

m3

ρ

1000

kg·m-3

∆Hrxn

-

J·kmol-1

23 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 58

For simplicity, CA and T are assumed to be measured at equally-spaced sampling intervals in the current example, but the ABEM, AEM, LAB and LAMLE parameter estimation methodologies can readily handle irregularly sampled data. The initial steady-state values of the states are CA(0)= 1.569 kmol·m-3 and T(0)=341.37 K. The initial temperature T(0) is assumed to be unknown and will be approximated using T~(0), obtained using the unknown B-spline coefficients, which will be estimated along with the model parameters. In the analysis below, the AEM and ABEM methodologies are first used to estimate the deterministic model parameters

θ = [k ref , E / R, a, b]T and disturbance intensities QC and QT, as well as the B-spline coefficients that provide estimates for T(0) and the state trajectories. Although the estimated spline-coefficients are only applicable to the particular state trajectories, estimates of θ, QC and QT provide information about the underlying

process

behaviour

that

can

be

extrapolated

to

other

operating

conditions.

The ABEM results are obtained using user-supplied initial guesses for kref , E / R, a, b and T(0), whereas AEM does not make use of this prior information. A variety of initial guesses and uncertainty levels are considered in this example to explore the effectiveness of the algorithms. Measurement variances σ C2 and

σ T2 are assumed to be perfectly known by the modeler. Similarly, the effectiveness of LAMLE and LAB methodologies are illustrated and compared below for situations where σ C2 and σ T2 are unknown. The LAB implementation relies on initial guesses and uncertainty ranges for σ C2 and σ T2 provided by the modeler (along with the prior information provided to ABEM), whereas LAMLE is not capable of using this prior information.

ABEM and AEM Objective Functions and Methodology for CSTR Case Study The ABEM objective function (JABEM, CSTR) for estimating parameters in the CSTR model is:

24 ACS Paragon Plus Environment

Page 25 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

J ABEM ,CSTR =

nC

1

σ C2 +

∑ (C

A (t m 1, j ) − C A ~ (t m 1, j ) ) +

2

j =1

1 QC

tq



[

t0

∑ (T (t nT

1

σ T2

) − T~ (t m 2, j ) )

2

m 2, j

j =1

d C A~ (t ) F (t ) − (C A 0 (t ) − C A~ (t )) + k r C A ~ (t )] 2 d t dt V

tq

+

d T (t ) F (t ) 1 [ ~ − (T0 (t ) − T~ (t )) − UAT~ (t ) − Tin (t )] − γ k r C A~ (t )] 2 d t ∫ QT t 0 d t V

(23)

+ q C ln(QC ) + qT ln(QT ) +

( k ref − k ref ) 2

σ k2

ref

+

( E / R − E / R) 2 2 σ ER

+

(a − a ) 2

σ a2

+

(b − b ) 2

σ b2

+

(T (0) − T (0)) 2

σ T20

2 where k ref , σ k2ref , E / R , σ ER , σ a2 , a , b , σ b2 , T (0) and σ T20 are modeler-specified means and variances

that indicate prior knowledge about k ref , E / R , a, b and T (0) , respectively. Note that specifying an initial guess for T (0) corresponds to also specifying an initial guess for the spline coefficient β T , 0 so that

T (0) = T~ (0) . The value of β T , 0 is then updated after each iteration when new values of T (0) and the model parameters become available. Since the initial concentration C A (0) = 1.569 kmol m-3 is assumed to be perfectly known in this example, the intial B-spline parameter for the concentration trajectory is set at

β CA,0 =1.569 and is not updated. The parameter vector ζ = [k ref , E / R, a, b, QC , QT , T (0)]T is estimated along with the B-spline coefficients for the concentration and temperature trajectories by minimizing objective function (23). The corresponding AEM objective function is the same as Eq. (23), except that the last five terms related to the prior information are not included.

LAB and LAMLE Objective Functions and Methodology for CSTR Case Study Estimation of the unknown parameter vector ζ = [ k ref , E / R, a, b, QC , QT , σ C2 , σ T2 , T (0)]T using the LAB methodology requires an iterative two-step approach. In the first step, kref, E/R, a, b and the spline

25 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 58

coefficients B are estimated by optimizing the following objective function, using the most recent values of QC , QT, σ C2 and σ T2 : tq

J LAB ,CSTR =

d C A ~ (t ) F (t ) 1 [ − (C A 0 (t ) − C A~ (t )) + k r C A~ (t )] 2 d t ∫ QC t 0 dt V t

q d T (t ) F (t ) 1 + [ ~ − (T0 (t ) − T~ (t )) − UA [T~ (t ) − Tin (t )] − γ k r C A~ (t )] 2 d t ∫ QT t 0 d t V

+ +

1

σ C2

nC

∑ (C

(t m 1, j ) − C A~ (t m 1, j ) ) + 2

A

j =1

( k ref − k ref )

σ k2

2

+

( E / R − E / R)

2

2 σ ER

ref

where k ref , E / R , a , b ,

+

nT

∑ (T (t

1

σ T2

(24)

) − T~ (t m 2 , j ) )

2

m 2, j

j =1

(a − a ) 2

σ a2

+

(b − b ) 2

σ b2

+

(T (0) − T (0)) 2

σ T2 0

2 2 2 2 T (0) , σ k2 , σ ER , σa , σb and σ T 0 are initial parameter guesses and the ref

corresponding variances used by the modeler to summarize prior knowledge. The values QC, QT, σ C2 and σ T2 are then updated using: nC

σ C2 , k +1 =

∑ (C

(t m C , j ) − C A~ (t m C , j ) ) + tr (Ψ C H C Ψ CT ) + ωσ ,C −1

2

A

j =1

(25)

nC + 1 tq

QC , k +1 =

2

 dC A~ (t ) F (t )  ∫t  dt − V (C A0 (t ) − C A~ (t )) + k r C A~ (t )  dt 0 1+

nC

σ T2,k +1 =

∑ (T (t

1

σ C2 , k

−1

tr ( Ψ C H C Ψ )

) − T~ (t m T , j ) ) + tr ( Ψ T H T Ψ TT ) + ω σ ,T 2

m T,j

(26)

T C

−1

j =1

(27)

nT + 1 tq

2

 dT~ (t ) F (t )  ∫t  dt − V (T0 (t ) − T~ (t )) − UA(T~ (t ) − Tcin (t )) − γ k r C A (t )  dt QT , k +1 = 0 1 −1 1 + 2 tr (Ψ T H T Ψ TT )

σ T ,k

26 ACS Paragon Plus Environment

(28)

Page 27 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

where ωσ,C and

ωσ,T are prior guesses for the unknown measurement variances σC2 and σT2 . HC and HT

2 2 2 2 are Hessian matrixes: HC = ∂ J LAB,CSTR ∂βC and HT = ∂ J LAB,CSTR ∂βT where βC is a vector containing cC

spline coefficients and βT contains cT spline coefficients. Note that the number and location (in time) for the spline coefficients for the two trajectories must be specified. A adequate number of spline knots must be selected by the modeler to obtain an accurate approximation for the state trajectory. One knot at every sampling interval used to generate the disturbances. This knot placement frequency was chosen based on results from a preliminary study involving several single-state SDE models.62 Using too many knots can lead to long computational times. Overall, the objective function in this problem should be optimized over B-spline coefficients as well as parameters. Therefore, the number of data cannot be very limited. It is essential for the proposed algorithms to take advantage of fast and efficient state-of-the-art minimization routines. IPOPT20, which is a nonlinear solver that can be used with AMPL™ (a Mathematical Programming Language)80, provides an excellent tool for solving nonlinear optimization problems. Further information about selection of spline knots and other implementation details using AMPL™80 and the IPOPT optimization package20 are provided in the section below. Iteration between the optimization step and update step are repeated until convergence is obtained. The required Hessians are computed using the “gjh” function in IPOPT. It should be noted that the B-spline representation of states creates smoothed state trajectories and does not convert the parameter estimation problem into a discrete one. The discretization has been only used for the numerical integration in the solver. The LAMLE objective function and algorithm are similar to the LAB objective function and algorithm except that the last 5 terms in objective function (24) and the

ωσ,C and ωσ,T terms in Eqs. (25) and (27) do

not appear in the corresponding LAMLE equations because the development of LAMLE did not consider that the modeler would be able to specify prior information.

27 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

SIMULATION RESULTS The ODE45 solver in MATLAB™ was used to generate simulated data (with measurement noise included in the data points and random Gaussian disturbances added at discrete times tq on the right hand sides of the differential equations). The size of these random measurement errors and disturbances were determined using the true values of QC , QT, σ C2 and σ T2 provided at the top of Table 4. The simulations were conducted using the input variable trajectories shown in Figure 1, wherein each of the inputs (i.e., the feed flow rate F, the inlet concentration CA0, the inlet temperature T0, the coolant inlet temperature Tin and the flow rate of coolant to the cooling coil Fc) is perturbed. True values of the model parameters used to generate the simulated data are provided in bold at the top of Table 4.79 The duration of each simulated dynamic experiment is 64 min. In the simulations, the stochastic process disturbances were generated using discrete white noise in MATLABTM using a zero-order hold and a sampling interval of ∆t=0.5 min, which is approximately 10 times smaller than the residence time of the CSTR. The same discretization interval was assumed when discretizing the states so that qC and qT in objective function (23) are set at 128. The ABEM, AEM, LAB and LAMLE objective functions were optimized using the IPOPT solver.20 Model information was provided to the optimizer using AMPL™.80 The maximum numbers of iterations and function evaluations were set to 1000 and 2000, respectively. When optimizing the objective functions, default values of optimization settings in IPOPT were used. Cubic (fourth order) B-splines were used in all parameter estimation studies with 384 equally-spaced knots, which was shown to be effective in our previous work using the LAMLE objective function and the CSTR model.61 When implementing the LAB algorithm, estimates of the disturbance and noise parameters were believed to have converged when the change in the scaled relative error e(k) was less than 10-3 where e(k) is defined as:

28 ACS Paragon Plus Environment

Page 28 of 58

Page 29 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

e(k ) = (

Q C , k − Q C , k −1 QC ,k

)2 + (

QT , k − QT , k −1 QT , k

)2 + (

σ C2 , k − σ C2 , k −1 2 σ T2 , k − σ T2 , k −1 2 ) +( ) σ C2 , k σ T2 , k

29 ACS Paragon Plus Environment

(29)

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 30 of 58

Table 4. True parameter values, median values and IQRs for the estimates based on 100 runs kref

(E/R)/ 103

Units

min-1

K

True Value Scenario

0.461

8.3301

1.678

0.5

Median

0.432

8.2308

1.502

IQR

0.022

0.2062

Median

0.452

IQR

Parameters to Estimate

1

ABEM, AEM

LAB, LAMLE 2 3

LAB, LAMLE LAMLE LAB

4

LAMLE LAB

5

LAMLE LAB

a/106

b

T(0)

QC

QT

341.38

kmol2·m6 ·min-1 0.01

K2·mi n-1 4

0.50

342.32

0.009

4.9

0.465

0.10

1.21

0.004

1.4

8.3383

1.661

0.49

341.32

0.01

4.5

0.02

0.1914

0.421

0.08

1.08

0.002

1.3

Median

0.448

8.3273

1.652

0.49

341.32

0.01

IQR Median IQR Median IQR Median IQR Median IQR Median IQR Median IQR

0.021 0.448 0.021 0.441 0.042 0.401 0.0472 0.449 0.051 0.681 0.082 0.492 0.066

0.2403 8.3273 0.2403 8.2317 0.1423 7.5140 0.4206 8.3502 0.1639 9.8068 0.6432 8.4739 0.1947

0.493 1.652 0.493 1.676 0.421 1.457 0.648 1.644 0.472 1.878 0.965 1.716 0.491

0.11 0.49 0.11 0.50 0.12 0.47 0.14 0.49 0.10 0.45 0.27 0.49 0.14 30

1.07 341.32 1.07 342.02 0.81 342.05 1.09 341.27 1.08 340.27 1.15 341.29 1.17

0.003 0.01 0.003 0.012 0.0017 0.015 0.018 0.008 0.021 0.002 0.002 0.003 0.002

K

ACS Paragon Plus Environment

σ C2

σ T2

kmol2·m6

K2

4×10 -4

0.64

5.1

0.00039

0.64

1.1 5.1 1.1 4.8 1.31 4.9 1.1 5.2 1.01 4.8 1.7 4.7 1.9

0.00011 0.00039 0.00011 0.00040 0.00015 0.00035 0.00014 0.00037 0.00011

0.16 0.64 0.16 0.64 0.20 0.59 0.17 0.61 0.11 0.73 0.16 0.68 0.15

Page 31 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

The parameters in the vector ζ were estimated using the AEM, ABEM, LAMLE and LAB objective functions (see Table 2) under five different scenarios summarized in Table 5.

Table 5. Scenarios used to illustrate SDE parameter estimation algorithms Scena rio

Noise Variances

nT

nC

Prior Information Specified for Parameters

1

Known

128

128

None

2

Unknown

128

128

None

3

Unknown

128

128

kref, E/R, a, b, σ

2 C



4

Unknown

128

22

kref, E/R, a, b, σ

2 C

, T(0) and σ

5

Unknown

128

0

kref, E/R, a, b, σ

2 C

, T(0) and σ

2 T

and T(0) 2 T

2 T

In Scenario 1, frequent measurements for both CA and T are available for parameter estimation (i.e., both are measured every 0.5 min, resulting in 128 concentration measurements and 128 temperature measurements). No prior information about the unknown parameters is available (except for the initial guesses used to initiate the parameter estimation) and measurement noise variances are perfectly known. In Scenario 2, the same data are available for parameter estimation, but the measurement noise variances for concentration and temperature are unknown. No prior parameter information (except for initial guesses) is specified by the modeler. Scenario 3 is the same as Scenario 2 except that the modeler specifies prior information about the model parameters and unknown noise variances (i.e., initial guesses and standard deviations for

k ref , E / R, a, b, T (0) and initial guesses for σ C2 and σ T2 ). Scenario 4 is the same as Scenario 3, except that concentration measurements are available less often (i.e., only 22 equally-spaced

ACS Paragon Plus Environment

31

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

measurements rather than 128).

Page 32 of 58

In Scenario 5, the parameter estimation must rely on

temperature measurements alone. In all of the scenarios, the initial concentration is assumed to be perfectly known, but the initial temperature is poorly known and requires estimation. These scenarios were selected because they help to show that, when prior information about parameters is available, the Bayesian methods (ABEM and LAB) are better able to handle situations involving infrequent or missing data than the corresponding maximum likelihood methods (AEM and LAMLE). Table 6 summarizes the criteria that a model developer might use when selecting from among the four parameter-estimation algorithms.

Table 6. Objective function selection criteria Measurement Method

Noise

Variance

Prior

Parameter

Information

Iteration Required due to Hessian

AEM

Known

None

No

ABEM

Known

If available

No

LAMLE

Known or Unknown

None

Yes

LAB

Known or Unknown

If available

Yes

Both AEM and ABEM require the measurement noise variances to be known a priori, whereas LAMLE and LAB are able to estimate unknown noise variances. The Bayesian methods (ABEM and LAB) are able to account for prior information about the parameters, whereas AEM and LAMLE are not. The LAMLE and LAB algorithms are slightly more complicated to implement than AEM and ABEM because they require additional update steps to determine the disturbance intensities and measurement noise variances (see Eqs. (25) to (28)) due to the Hessian that

ACS Paragon Plus Environment

32

Page 33 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

appears in the corresponding objective functions. No difficulties with convergence of any of the algorithms was encountered in the current simulation study. Typical computation times required to generate the a set of simulated data and then estimate the parameters in Scenario 1 using MATLABTM on a laptop computer with Intel® Core™ i75600U CPU and 2.6 GHz processor were 10 seconds for ABEM and AEM and 40 seconds for LAB and LAMLE. Simulated data sets for each scenario were generated 100 times using true values of the parameters, measurement variances and disturbance intensities (shown in bold at the top of Table 4). In addition, 100 sets of random initial guesses for the parameters were selected (i.e., from uniform distributions between 50% and 150% of the corresponding true values) along with prior parameter uncertainty levels (i.e., values of σ kref , σ ER ,

σa and σb ), which were set at 60% of the

corresponding initial guesses. This information was used to specify the modeler’s prior beliefs in Scenarios 3, 4 and 5 when using objective functions (23 ) and (24). Similarly, 100 initial guesses for the initial temperature T(0) were selected randomly between 324 K and 358 K. The initial uncertainty in T(0) was set at

σ T 0 = 5.0 K in Scenarios 3, 4 and 5.

Comparison of ABEM, AEM, LAMLE and LAB Simulation Results Table 4 shows medians and interquartile ranges (IQRs) for the replicated parameter estimates obtained using the five scenarios, so that the effectiveness of the AEM, ABEM, LAMLE and LAB parameter estimation approaches can be compared. For Scenario 1, since there is no prior information about the unknown parameters, there are no Bayesian parameter penalty terms in the ABEM objective function (Eq. (23)) and it becomes identical to the AEM objective function. As a result, there is only one row in Table 4 for both ABEM and AEM for this scenario. Similarly,

ACS Paragon Plus Environment

33

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 58

the LAB objective function (Eq. (24)) becomes identical to the LAMLE objective function. Because noise variances are assumed to be perfectly known, the noise variances are not included in the list of decision variables for optimization of objective function (23) using ABEM/AEM in Scenario 1. When using LAB/LAMLE, no updating of variance estimates via Eqs. (25) and (27) is required for this scenario. Because a rich data set is available in Scenario 1 estimates of all of T the unknown parameters contained in ζ = [k ref , E / R, a, b, T (0), QC , QT ] are quite close to the

true values and the corresponding IQRs are relatively small, indicating that both ABEM/AEM and LAB/LAMLE are effective for parameter estimation. As expected, the LAB/LAMLE estimates tend to be better on average than the ABEM/AEM estimates because of the additional Hessian term in objective function (2.5), which results in the update step for disturbance intensities QC and QT via Eqs. (26) and (28). For example, the median estimate of 0.452 min-1 obtained via LAB/LAMLE is closer to the true value of 0.461 min

-1

than the value of 0.432

obtained via ABEM/AEM. Estimated trajectories for concentration and temperature (i.e., the smoothed B-spline trajectories CA~ and T~ obtained using ABEM) for one of the 100 simulated data sets in Scenario 1 are shown in Figure 2, along with the true state trajectories and the corresponding data values. The estimated state trajectories follow the true trajectories closely because of the accurate parameter estimates and appropriate B-spline knot settings.61 Table 7 shows the parameter estimates and approximate 95% confidence intervals obtained using the simulated data in Figure 2. These confidence intervals were obtained in a similar fashion to the AEM confidence intervals in our previous work:62

ζˆ ± zα / 2 diag(∂ 2 J ABEM / ∂ζ 2 ) −1

(30)

ζˆ

ACS Paragon Plus Environment

34

Page 35 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

where

zα/ 2 is

(1−α/2)th quartile value of the standard Gaussian distribution. Similar

confidence intervals for the model parameters and unknown initial condition T(0) can be obtained using the LAB and LAMLE methodologies. However, we have not been able to easily construct confidence intervals for the disturbance intensities and measurement noise variances using LAB and LAMLE, because of the difficulty in obtaining the Hessian in objective functions (2.4) and (2.5). Confidence interval information for the remaining scenarios in Table 4 could be readily be obtained for individual simulated data sets but will not be presented here.

Table 7. Estimates and 95% Confidence intervals for ABEM parameter estimates from one of the 100 Monte Carlo simulations Estimate ±95% Paramete Unit

True Value

Initial Guess Approximate

r Confidence Interval kref

min-1

0.461

0.570

0.431 ±0.046

(E/R)/ 103

K

8.3301

6.9171

8.151 ±0.345

a/106

1.678

1.321

1.488 ±0.124

b

0.50

0.548

0.53 ±0.17

QC

kmol2·m-6·min-1

0.010

0.008

0.009 ±0.001

QT

K2·min-1

4.0

4.1

4.7 ±0.50

T(0)

K

341.38

343.71

342.08 ±1.43

ACS Paragon Plus Environment

35

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 58

In Scenario 2, ABEM and AEM are not appropriate for estimating the model parameters because these methods require that the modeler provide noise variance values that are assumed to be true. The LAB/LAMLE estimates for Scenario 2 are quite good because of the rich data set. As expected, the estimates for Scenario 2 in Table 4 are not as good as those for LAB/LAMLE in Scenario 1, because the unknown noise variances, which require estimation, resulted in additional uncertainty. In Scenario 3, where prior knowledge is specified for the unknown parameters and noise variances, it is inappropriate to use the LAMLE and AEM methodologies because they do not account for prior information.

Also, ABEM cannot be used because measurement noise

variances are not unknown in this scenario. As expected, the LAB parameter estimates in Scenario 3 are similar in quality to the LAB/LAMLE estimates in Scenario 2, which also had unknown noise measurement variances. Some of the median parameter estimates are close to the true values (i.e., a, b, QT and

σT2 ).

σC2 ), whereas other are further away (i.e., kref, E/R, T(0), QC and

The reason that the prior parameter information did not lead to significant improvements in

the parameter estimates, on average, is because of the large number of data values available for parameter estimation in this scenario and also the relatively modest prior information provided by the modeler. If we had opted to use LAMLE for this scenario, while ignoring prior information about the unknown model parameters and noise variances, the LAMLE results would be identical to those shown for LAB/LAMLE in Scenario 2. In Scenario 4, the number of temperature data points remains large, but the number of concentration measurements is reduced to only 22. In this case it is best to use the LAB methodology because it can accommodate the prior information available for the model parameters and noise variances. In Table 4, we also show Scenario 4 results for LAMLE, in

ACS Paragon Plus Environment

36

Page 37 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

which the prior information cannot be accommodated and is ignored. As expected, the LAB results for Scenario 4 are worse than those for Scenario 3 because fewer data points are available. The LAB results are noticeably better than those obtained using LAMLE for Scenario 4 because of the importance of the prior knowledge when data are sparse. All of median model parameter and noise parameter estimates obtained using LAB are closer to the corresponding true values than those obtained using LAMLE, except for the estimate of QT. In addition, the IQRs tend to be narrower for the LAB estimates than for the LAMLE estimates. In Scenario 5, no concentration measurements are available. The estimation must rely only temperature measurements along with the known initial concentration and the prior information about model parameters, T(0) and

σT2 .

No convergence problems were obtained for either

LAMLE (with the prior information ignored) or LAB. The resulting estimates and IQRs are shown at the bottom of Table 4. As expected, the LAB estimates are superior to the LAMLE estimates because of the importance of the prior information in this limited-data situation. Note that no estimates are shown for

σC2 , which was not estimated because there were no

concentration data. Results in Table 4 indicate that LAB and ABEM can provide accurate parameter estimates over a range of scenarios, including those where there are small datasets but reasonable prior parameter information is available. As expected, the proposed Bayesian methods, LAB and ABEM, provide similar parameter estimates to LAMLE and AEM when rich data sets are available for parameter estimation. Noticeably better estimates are obtained by LAB and ABEM (compared with LAMLE and AEM) when data sets are small and prior parameter information is available for parameters. LAB provides better results than ABEM because a more accurate approximation was used for the likelihood function when deriving the LAB objective function.

ACS Paragon Plus Environment

37

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 58

Because of this better approximation, the LAB objective function has more terms than the ABEM objective function. For the same reason, LAMLE parameter estimation results are better than AEM parameter estimates. LAB and ABEM provide more accurate parameter estimates than LAMLE and AEM when the size of data set is small because their objective functions have extra terms that can incorporate to prior knowledge about parameter values. Although the final parameter estimates in Table 4 give a good fit to the data, these estimation results may also correspond to a local minimum. One shortcoming of the proposed methods compared to particle filter31,32,37 methods is that the proposed methods introduce additional approximations when computing the likelihood function. The main approximations correspond to using B-spline approximations to find a closed form for the likelihood function rather than employing many particles. These approximations reduce the computational cost, but may cause some of the bias observed in Table 4. However, the main bias is due to very small number of data points used for parameter estimation. In future, it would be interesting to test the models with the estimated parameters at new operating conditions that were not used for parameter estimation. This investigation can show whether the model predictions are valid in different operating conditions or not. It is predicted that the model predictions will be adequate accuracy in various operating conditions because of the fundamental nature of the model. Although the parameter estimation technique developed in this paper assumed that the measurement noise variances and stochastic disturbances have Gaussian distributions, it can be extended to the cases where these distributions are non-Gaussian. In future, it will be advantageous to extend the proposed techniques to the cases where the random measurement error and stochastic disturbances have non-Gaussian distributions.

ACS Paragon Plus Environment

38

Page 39 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

It is recommended that the proposed algorithms should be tested on larger-scale SDE parameter estimation problems and that the performance of these algorithm should be compared with particle filter-based methods and PC-based methods. These types of simulations can determine whether the B-spline and other approximations used to develop the proposed methods result in any significant degradation in the quality of parameter estimates when compared with particle filter methods.The computation time for the proposed ABEM and LAB methods is expected to be significantly lower than for particle filter techniques.4,31,32 In future, it will be desirable to test the proposed methodologies using larger-scale models with a larger number of states and parameters.81 Furthermore, it will be advantageous to develop techniques for determining inference regions for process disturbance intensities and the measurement noise variances. It will also be important to investigate the convergence of the methods. Similar to AMLE the proposed algorithms can be used for parameter estimation in dynamic models driven by non-stationary process disturbances, because non-stationary disturbances can be considered to be unmeasured states.82

CONCLUSIONS This article presents detailed derivations for an Approximate Bayesian Expectation Maximization (ABEM) objective function and a Laplace Approximation Bayesian (LAB) objective function for estimating parameters in nonlinear SDE models. The proposed ABEM method is an important advance over the approximate expectation maximization (AEM) algorithm previously developed by our research group62 because it can use prior information about model parameters and unknown initial conditions. The ABEM method permits modelers to

ACS Paragon Plus Environment

39

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 58

estimate model parameters and the magnitudes of process disturbances in SDE models even when the data sets are relatively small. The ABEM methodology assumes that measurement noise variances are perfectly known by the modeler. The proposed LAB methodology builds on the Laplace Approximation Maximum Likelihood Estimation (LAMLE) method and uses prior information about model parameters and unknown initial conditions. The LAB methodology permits estimating unknown measurement noise variances along with model parameters and process disturbance intensities, even when the sets are small. A nonlinear Continuous Stirred Tank Reactor (CSTR) model with additive stochastic disturbances was used to test the ABEM and LAB methodologies and to compare their performance to AEM and LAMLE methodologies, respectively. For the CSTR example studied, the simulation results showed that LAB provides the most accurate parameter estimates for the scenarios studied, especially where the size of dataset is small and prior parameter information is available. When a large dataset is available for parameter estimation, the accuracy of parameter estimates obtained from the proposed LAB and ABEM methodologies is similar to that obtained from LAMLE and AEM methodologies, respectively. The proposed parameter estimation methodologies are straightforward to implement using available optimization and modeling packages.20,80

ACKNOWLEDGEMENTS Financial support provided by NSERC is gratefully acknowledged.

Nomenclature

ACS Paragon Plus Environment

40

Page 41 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Abbreviations ABEM

approximate Bayesian expectation maximization

AEM

approximate expectation maximization

AMLE

approximate maximum likelihood estimation

B

B-splines

CP

Chaos polynomials

CSTR

continuous stirred tank reactor

D

deterministic

DE

differential equation

DP

Disturbance penalty

EM

expectation maximization

FLAEM

Fully-Laplace-Approximation Expectation Maximization

HP

Hessian penalty

IQR

interquartile range

K

known

LA

Laplace approximation

LAB

Laplace approximation Bayesian

LAMLE

Laplace approximation maximum likelihood estimation

MCMC

Markov chain Monte Carlo

MIMO

multi-input multi-output

MLE

maximum likelihood estimation

MP

model-based penalty

NVP

noise variance penalty

PP

prior penalty

SDE

stochastic differential equation

S

stochastic

ODE

ordinary differential equation

U

unknown

WLS

weighted least squares

Roman letters a

CSTR model parameter relating heat-transfer coefficient to coolant flow rate

b

CSTR model exponent relating heat-transfer coefficient to coolant flow rate

cs

number of B-spline coefficients for sth state trajectory

CA

concentration of reactant A (kmol·m-3)

ACS Paragon Plus Environment

41

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 58

CA0

feed concentration of reactant A (kmol·m-3)

cp

heat capacity of reactor contents (J·kg-1·K-1)

cpc

coolant heat capacity (J·kg-1·K-1)

C1

constant in Eq. (D.1)

D

function of f and its derivatives shown in Eq. (D.13)

det

determinant

e(k)

scaled relative error

E/R

activation energy divided by the ideal gas constant (K)

f

X-dimensional nonlinear function on the right-hand side of the SDE model

F

reactant volumetric flow rate (m3·min-1)

Fc

coolant volumetric flow rate (m3·min-1)

g

Y-dimensional vector of nonlinear functions on the right-hand side of Eq.

∆Hrxn

enthalpy of reaction (J·kg-1·K-1)

HX~

Hessian matrix of − ln p(Xq , Ym | ζ) with respect to Xq evaluated at Xq~

HB I

Hessian matrix of identity matrix

J

Objective function in Eq. (19)

JAMLE

AMLE objective function defined in Eq. (2.1)

JAEM

AEM objective function defined in Eq. (2.2)

JABEM

ABEM objective function defined in Eq. (2.4)

JABEM,CSTR

ABEM objective function for CSTR model defined in Eq. (23)

Jd

objective function defined in Eq. (D.3)

JLAB

LAB objective function defined in Eq. (2.5)

JLAB,CSTR

AEM objective function for CSTR model defined in Eq. (24)

JLAMLE

LAMLE objective function defined in Eq. (2.3)

kref

kinetic rate constant at temperature Tref (min-1)

kr

reaction rate constant defined

M

order of B-spline basis functions

n

number of measurements

nC

number of measurements for concentration of reactant A

Nr

number of measurements for rth response

nT

number of measurements for temperature

Nu

number of unknown initial state values

P

number of unknown model parameters

p(.)

probability density function

− ln p(Xq , Ym | ζ) with respect to B-spline basis

ACS Paragon Plus Environment

42

Page 43 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

q

number of discretization points for SDE model

Q

diagonal power spectral density function

Qd

vector of disturbance intensities

QC

disturbance intensity for concentration (kmol2·m-6·min-1)

QT

disturbance intensity for temperature (K2·min-1)

Qs

disturbance intensity for state s

σ T20

measurement noise variance for T(0)

t

time (min)

t0

initial time (min)

ti

times used for discretizing SDEs (min)

tm r,j

jth measurement time at for the rth response (min)

tq

final time (min)

T

temperature of reactor contents (K)

Tcin

inlet temperature of coolant (K)

tr

trace

Tref

reference temperature (K)

∆t

sampling time used for discretizing SDEs and disturbances (min)

u

U-dimensional vector of input variables for SDE model

U

dimension of the vector of the input variables

UA

heat transfer coefficient

Um

stacked vector of input values at measurement times with different sampling

V

volume of the reactor (m3)

w

function defined in Eq. (D.5)

W(t)

Wiener process

x

state vector

X

dimension of state vector

x0

state vector at the initial time t0

x 0u

vector of the unknown initial conditions

x 0u Xm

T vector of selected means of the initial state values x0u = [x1,...,xNu ] stacked vector of state values at measurement times

Xq

stacked vector of state values at discrete times

xs

sth state variable

xs

mean for sth initial state value x0u,s

x~

B-spline approximation of the vector of state trajectories x

x& ~ ( t )

time derivative for vector x~

ACS Paragon Plus Environment

43

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 58

yC

concentration measurements

yT

temperature measurements

y

Y-dimensional output vector

Y

dimension of output vector

Ym

vector of stacked measured values at the measurement times

yr

rth measured output

zα/ 2

(1−α/2)th quantile value of the standard Gaussian distribution

Greek letters α

significance level for confidence intervals

B

stacked vector of B-spline coefficients

βs,l

lth B-spline coefficient for sth state trajectory

βc,1

first B-spline coefficient for concentration state trajectory

βs

vector of spline coefficients corresponding to the sth state trajectory

γ

constant defined as γ = (−∆H rxn ) multivariate gamma function

Γν(·) δ (.)

ρc p

Dirac delta function

ε

Y-dimensional vector of zero-mean random variables

εC

measurement noise for concentration(kmol·m-3)

εT

measurement noise for temperature (K)

εm

stacked vector of measurement noise values at measurement times

ζ

vector of unknown parameters defined as ζ = [θ , x0u , Qd , Σd ]

η(t)

X-dimensional continuous zero-mean stationary white-noise process

ηd(t)

X-dimensional discrete zero-mean stationary white-noise process

ηC(t)

continuous zero-mean stationary white-noise process for concentration SDE

ηT(t)

continuous zero-mean stationary white-noise process for temperature SDE

θ

vector of model parameters

θ

T

i

θi

T

T

T T

ith model parameter user specified mean for the ith model parameter

ρ

density of reactor contents (kg· m-3)

ρc

coolant density (kg· m-3)

Σ

covariance matrix for measurement errors defined in Eq. (3)

Σd

diagonal elements of the covariance matrix of measurements

σ0i2,

variance for the ith parameter

ACS Paragon Plus Environment

44

Page 45 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

σr2 σC2 σT2

measurement noise variance for rth response

σ x2,s

variance for sth initial state value x0u,s

σθ2 σ0i2, 2 σ0u

vector of assigned prior parameter variances defined as

ν φ s (t )

dimension of matrix Ω vector of B-spline basis function for sth state trajectory

φ s ,l ( t )

lth B-spline basis function for sth state trajectory

Φ (t )

matrix of B-spline functions defined in Eq. (10)



ν ×ν

Ψ

matrix of spline functions defined as Ψ = Φ (t1 ) Φ (t 2 ) K Φ (t q ) T

measurement noise variance for concentration measurement noise variance for temperature σ θ2 = [σ θ2 ,1 ,..., σ θ2 , P ] T

variance for the ith parameter vector of variances selected for the unknown initial states

positive-definite matrix of initial guesses for Σ

[

]

Subscripts i

index of times used for discretizing SDE

j

index of number of measurements

k

index of iterations

l

index for B-spline coefficients

r

index for response variable

s

index for state variables

~

subscript used to indicate smoothed state trajectories estimated using B-

Superscripts T

transpose

REFERENCES (1)

Érdi, P.; Tóth, J. (János). Mathematical Models of Chemical Reactions : Theory and Applications of Deterministic and Stochastic Models; Manchester University Press, 1989.

(2)

KING, R. P. Applications of Stochastic Differential Equations to Chemical-Engineering

ACS Paragon Plus Environment

45

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 58

Problems- An Introductory Review. Chem. Eng. Commun. 1974, 1 , 221. (3)

Chitralekha, S. B.; Prakash, J.; Raghavan, H.; Gopaluni, R. B.; Shah, S. L. A Comparison of Simultaneous State and Parameter Estimation Schemes for a Continuous Fermentor Reactor. J. Process Control 2010, 20 , 934.

(4)

Jang, S. S.; Gopaluni, R. B. Parameter Estimation in Nonlinear Chemical and Biological Processes with Unmeasured Variables from Small Data Sets. Chem. Eng. Sci. 2011, 66 , 2774.

(5)

Lente, G. A Novel Method to Compute the Time Dependence of State Distributions in the Stochastic Kinetic Description of an Autocatalytic System. Comput. Chem. Eng. 2016,1 , 1.

(6)

Tronci, S.; Grosso, M.; Alvarez, J.; Baratti, R. On the Global Nonlinear Stochastic Dynamical Behavior of a Class of Exothermic CSTRs. J. Process Control 2011, 21, 1250.

(7)

Tronci, S.; Grosso, M.; Alvarez, J.; Baratti, R. Stochastic Dynamical Nonlinear Behavior Analysis of a Class of Single-State CSTRs; IFAC, 2009; Vol. 42.

(8)

Bard, Y. Nonlinear Parameter Estimation; Academic Press Inc.: New York, 1974.

(9)

Seber, G. A. F. , Wild, C. J. Nonlinear Regression; John Wiley and Sons Inc.: New York, 1989.

(10)

Bates, D.M., Watts, D. G. Nonlinear Regression Analysis and Its Applications; John Wiley & Sons Inc.: New York, 1988.

(11)

Ogunnaike, B. A., Ray, W. H. Process Dynamics, Modeling and Control; Oxford

ACS Paragon Plus Environment

46

Page 47 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

University Press: New York, 1994. (12)

Biegler, L. T.; Damiano, J. J.; Blau, G. E. Nonlinear Parameter Estimation: A Case Study Comparison. AIChE J. 1986, 32 , 29.

(13)

Dov`; G., V.; Arato, E.; Maga, L. A More General Formulation of Separable Least Squares. Math. Model. 1985, 18, 20.

(14)

Englezos, P., Kalogerakis, N. Applied Parameter Estimation for Chemical Engineering; Marcel Dekker Inc.: New York, 2001.

(15)

Mansouri, N.; Kernévez, J. . Estimation of Parameters in Nonlinear Problems. Numer. Algorithms 1998, 17 , 333.

(16)

Bock, H. G. Numerical Treatment of Inverse Problems in Chemical Reaction Kinetics; Springer: Berlin, 1981.

(17)

Bock, H. G. Recent Advances in Parameteridentification Techniques for O.D.E. In Numerical Treatment of Inverse Problems in Differential and Integral Equations; Birkhäuser Boston: Boston, 1983.

(18)

Biegler, L. T. Solution of Dynamic Optimization Problems by Successive Quadratic Programming and Orthogonal Collocation. Comput. Chem. Eng. 1984, 8 , 243.

(19)

Biegler, L. T.; Grossmann, I. E. Retrospective on Optimization. Comput. Chem. Eng. 2004, 28, 1169.

(20)

Wächter, A.; Biegler, L. T. On the Implementation of an Interior-Point Filter Line-Search Algorithm for Large-Scale Nonlinear Programming. Math. Program. 2006, 106, 25.

ACS Paragon Plus Environment

47

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(21)

Page 48 of 58

Poyton, A. A.; Varziri, M. S.; McAuley, K. B.; McLellan, P. J.; Ramsay, J. O. Parameter Estimation in Continuous-Time Dynamic Models Using Principal Differential Analysis. Comput. Chem. Eng. 2006, 30, 698.

(22)

De Boor, C. A Practical Guide to Splines : With 32 Figures; Springer: New York, 2001.

(23)

O’Sullivan, F. A Statistical Perspective on Ill-Posed Inverse Problems. Statistical Science. 1986, 1, 502.

(24)

Eilers, P. H. C.; Marx, B. D. Flexible Smoothing with B -Splines and Penalties. Stat. Sci. 1996, 11, 89.

(25)

Ramsay, J. O. (James O. .; Silverman, B. W. Functional Data Analysis; Springer: New York, 2005.

(26)

Voss, H. U.; Timmer, J.; Kurths, J. Nonlinear Dynamical System Identification from Uncertain and Indirect Measurements. Int. J. Bifurc. Chaos 2004, 14, 1905.

(27)

Varziri, M. S.; McAuley, K. B.; McLellan, P. J. Parameter and State Estimation in Nonlinear Stochastic Continuous-Time Dynamic Models with Unknown Disturbance Intensity. Can. J. Chem. Eng. 2008, 86, 828.

(28)

Nielsen, J. N.; Madsen, H.; Young, P. C. Parameter Estimation in Stochastic Differential Equations: An Overview. Annu. Rev. Control 2000, 24, 83.

(29)

Maybeck, P. S. Stochastic Models, Estimation and Control; Academic Press: New York, 1982.

(30)

Julier, S. J.; Uhlmann, J. K. Unscented Filtering and Nonlinear Estimation. Proc. IEEE

ACS Paragon Plus Environment

48

Page 49 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

2004, 92, 401. (31)

Gopaluni, R. B. A Particle Filter Approach to Identification of Nonlinear Processes under Missing Observations. Can. J. Chem. Eng. 2008, 86, 1081.

(32)

Gopaluni, R. B. Nonlinear System Identification under Missing Observations: The Case of Unknown Model Structure. J. Process Control 2010, 20, 314.

(33)

Ljung, L.; Ljung; Lennart. System Identification. In Wiley Encyclopedia of Electrical and Electronics Engineering; John Wiley & Sons, Inc.: Hoboken, 1999.

(34)

Deistler; M. System Identification T. Söderström and P. Stoica Prentice Hall International, 1989. Econom. Theory 1994, 10, 813.

(35)

Chen, W. S.; Bakshi, B. R.; Goel, P. K.; Ungarala, S. Bayesian Estimation via Sequential Monte Carlo Sampling: Unconstrained Nonlinear Dynamic Systems. Ind. Eng. Chem. Res. 2004, 43, 4012.

(36)

Poyiadjis, G.; Doucet, A.; Singh, S. S. Maximum Likelihood Parameter Estimation in General State-Space Models Using Particle Methods. Proc of the American Stat. Assoc. 2005. 2,1.

(37)

Doucet, A.; Tadić, V. Parameter Estimation in General State-Space Models Using Particle Methods. Ann. Inst. Stat. Math. 2003, 55, 409.

(38) Kantas, N.; Doucet, A.; Singh, S.S.; Maciejowski, J.M. An overview of sequential Monte Carlo methods for parameter estimation in general state-space models, Jt. Stat. Meet. 2009, 42, 774.

ACS Paragon Plus Environment

49

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(39)

Page 50 of 58

Kantas, N.; Doucet, A.; Singh, S. S.; Maciejowski, J. M. An Overview of Sequential Monte Carlo Methods for Parameter Estimation in General State-Space Models. In IFAC Proceedings; 2009, 42, 774.

(40)

Zhao, Z.; Huang, B.; Liu, F. Parameter Estimation for Batch Processes with Measurements of Large Sampling Intervals. IFAC-Papers OnLine. 2015, 48 , 799.

(41)

Imtiaz, S. A.; Roy, K.; Huang, B.; Shah, S. L.; Jampana, P. Estimation of States of Nonlinear Systems Using a Particle Filter. In IEEE International Conference on Industrial Technology, 2006, 1, 2432.

(42)

Blanchard, E.; Sandu, A.; Sandu, C. Parameter Estimation for Mechanical Systems via an Explicit Representation of Uncertainty. Eng. Comput. 2009, 26, 541.

(43)

Marzouk, Y. M.; Najm, H. N.; Rahn, L. A. Stochastic Spectral Methods for Efficient Bayesian Solution of Inverse Problems. J. Comput. Phys. 2007, 224, 560.

(44)

Chen-Charpentier, B.; Stanescu, D. Parameter Estimation Using Polynomial Chaos and Maximum Likelihood. Int. J. Comput. Math. 2013, 91, 336.

(45)

Blanchard, E. D. Polynomial Chaos Approaches to Parameter Estimation and Control Design for Mechanical Systems with Uncertain Parameters. Ph.D. Dissertation, Ph.D. Dissertation, Massachusetts Institute of Technology, Blacksburg, Virginia, 2010.

(46) Blanchard, E. D.; Sandu, A.; Sandu, C. A Polynomial Chaos Based Bayesian Approach for Estimating Uncertain Parameters of Mechanical Systems Part I: Theoretical Approach. 2007, Multibody Syst. Dyn. 1, 540. (47)

Desceliers, C.; Ghanem, R.; Soize, C. Maximum Likelihood Estimation of Stochastic

ACS Paragon Plus Environment

50

Page 51 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Chaos Representations from Experimental Data. Int. J. Numer. Methods Eng. 2006, 66, 978. (48)

Li, J.; Xiu, D. A Generalized Polynomial Chaos Based Ensemble Kalman Filter with High Accuracy. J. Comput. Phys. 2009, 228, 5454.

(49)

Dutta, P.; Bhattacharya, R. Nonlinear Estimation of Hypersonic State Trajectories in Bayesian Framework with Polynomial Chaos. J. Guid. Control. Dyn. 2010, 33, 1765.

(50)

Eldred, M. S.; Burkardt, J. Comparison of Non-Intrusive Polynomial Chaos and Stochastic Collocation Methods for Uncertainty Quantification. 47th AIAA Aerosp. Sci. Meet. Incl. New Horizons Forum Aerosp. Expo. 2009, 1, 976.

(51)

Madankan, R.; Singla, P.; Singh, T.; Scott, P. Polynomial-Chaos-Based Bayesian Approach for State and Parameter Estimations. J. Guid. Control. Dyn. 2013, 36 , 1.

(52)

Marzouk, Y. M.; Najm, H. N. Dimensionality Reduction and Polynomial Chaos Acceleration of Bayesian Inference in Inverse Problems. J. Comput. Phys. 2009, 228, 1862.

(53)

Pence, B. L.; Fathy, H. K.; Stein, J. L. A Maximum Likelihood Approach To Recursive Polynomial Chaos Parameter Estimation. Control Conf. Marriott Waterfr. 2010, 1, 2144.

(54)

Pence, B. L.; Fathy, H. K.; Stein, J. L. Recursive Estimation for Reduced-Order StateSpace Models Using Polynomial Chaos Theory Applied to Vehicle Mass Estimation. Control Syst. Technol. IEEE Trans. 2013, 99 , 1.

(55)

Soize, C. Identification of High-Dimension Polynomial Chaos Expansions with Random Coefficients for Non-Gaussian Tensor-Valued Random Fields Using Partial and Limited

ACS Paragon Plus Environment

51

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 52 of 58

Experimental Data. Comput. Methods Appl. Mech. Eng. 2010, 199, 2150. (56)

Dutra, D. A. A.; Teixeira, B. O. S.; Aguirre, L. A. Joint Maximum a Posteriori State Path and Parameter Estimation in Stochastic Differential Equations. Automatica. 2017, 81, 1

(57)

Varziri, M. S.; McAuley, K. B.; McLellan, P. J. Parameter Estimation in Continuous-Time Dynamic Models in the Presence of Unmeasured States and Nonstationary Disturbances. Ind. Eng. Chem. Res. 2008, 47, 380.

(58)

Varziri, M.; McAuley, K.; McLellan, P. Approximate Maximum Likelihood Parameter Estimation for Nonlinear Dynamic Models: Application to a Laboratory-Scale Nylon Reactor Model. Ind. Eng. Chem. Res. 2008, 1, 7274.

(59)

Varziri, M. S.; Poyton, A. A.; McAuley, K. B.; McLellan, P. J.; Ramsay, J. O. Selecting Optimal Weighting Factors in IPDA for Parameter Estimation in Continuous-Time Dynamic Models. Comput. Chem. Eng. 2008, 32, 3011.

(60)

Karimi, H.; McAuley, K. B. An Approximate Expectation Maximization Algorithm for Estimating Parameters, Noise Variances, and Stochastic Disturbance Intensities in Nonlinear Dynamic Models. Ind. Eng. Chem. Res. 2013, 52, 18303.

(61)

Karimi, H.; McAuley, K. B. A Maximum-Likelihood Method for Estimating Parameters, Stochastic Disturbance Intensities and Measurement Noise Variances in Nonlinear Dynamic Models with Process Disturbances. Comput. Chem. Eng. 2014, 67, 178.

(62)

Karimi, H.; McAuley, K. B. An Approximate Expectation Maximisation Algorithm for Estimating Parameters in Nonlinear Dynamic Models with Process Disturbances. Can. J. Chem. Eng. 2014, 92, 835.

ACS Paragon Plus Environment

52

Page 53 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(63)

Karimi, H.; McAuley, K. B. A Bayesian Method for Estimating Parameters in Stochastic Differential. IFAC-Papers OnLine 2015, 48, 147.

(64)

Karimi, H.; McAuley, K. B. Bayesian Estimation in Stochastic Differential Equation Models via Laplace Approximation. IFAC-Papers OnLine 2016, 49, 1109.

(65)

Lindstrom, E. Estimating Parameters in Diffusion Processes Using an Approximate Maximum Likelihood Approach. Ann. Oper. Res. 2007, 151, 269.

(66)

Casella, G.; Berger, R. L. Statistical Inference; Thomson Learning: Duxbury, 2002.

(67)

Mandur, J.; Budman, H. A Polynomial-Chaos Based Algorithm for Robust Optimization in the Presence of Bayesian Uncertainty. IFAC Proc. 2012, 45, 549.

(68)

Ait-Sahalia, Y. Maximum Likelihood Estimation of Discretely Sampled Diffusions: A Closed-Form Approximation Approach. Econometrica 2002, 70, 223.

(69)

Kristensen, N. R.; Madsen, H.; Jørgensen, S. B. Parameter Estimation in Stochastic GreyBox Models. Automatica 2004, 40, 225.

(70)

Kristensen, N. R.; Madsen, H. Continuous Time Stochastic Modelling, CTSM 2.2. User Guide, Technical University of Denmark:Lyngby, 2003.

(71)

Robert, C. P.; Casella, G. Monte Carlo Statistical Methods; Springer Texts in Statistics; Springer: New York, NY, 2004.

(72)

Box, G. E. P.; Draper, N. R. The Bayesian Estimation of Common Parameters from Several Responses. Biometrika 1965, 52, 355.

(73)

Cogoni, G.; Tronci, S.; Mistretta, G.; Baratti, R.; Romagnoli, J. A. Stochastic Global

ACS Paragon Plus Environment

53

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 54 of 58

Model for the Prediction of the Asymptotic CSDs Using Antisolvent Crystallization Processes. ACOS AIDIC. 2013, 11, 2036. (74)

Liptser, R. S.; Shiryaev, A. N. Statistics of Random Processes: II. Applications; Springer: Berlin, Heidelberg, 2001.

(75)

Bishwal, J. Parameter Estimation in Stochastic Differential Equations.; Springer: New York, 2008.

(76)

Poyton, A. A.; Varziri, M. S.; McAuley, K. B.; McLellan, P. J.; Ramsay, J. O. Parameter Estimation in Continuous-Time Dynamic Models Using Principal Differential Analysis. Comput. Chem. Eng. 2006, 30, 698.

(77)

Karimi, H.; Cowperthwaite, E. V.; Olayiwola, B.; Farag, H.; McAuley, K. B. Modelling of Heat Transfer and Pyrolysis Reactions in an Industrial Ethylene Cracking Furnace. Can. J. Chem. Eng. 2017, 9999, 1.

(78)

Dempster, A. P.; Laird, N. M.; Rubin, D. B. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc. Ser. B. 1977, 39 , 1.

(79)

Marlin, T. E. Process Control : Designing Processes and Control Systems for Dynamic Perfomance; McGraw-Hill, 2000.

(80)

Fourer, R.; Gay, D. M.; Kernighan, B. W. A Modeling Language for Mathematical Programming. Manage. Sci. 1990, 36, 519.

(81)

Chaffart, D.; Ricardez-Sandoval, L. A. Robust Optimization of a Multiscale Heterogeneous Catalytic Reactor System with Spatially-Varying Uncertainty Descriptions Using Polynomial Chaos Expansions. Can. J. Chem. Eng. 2018, 96, 113.

ACS Paragon Plus Environment

54

Page 55 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(82)

Gagnon, L.; Macgregor, J. F. State Estimation for Continuous Emulsion Polymerization. Can. J. Chem. Eng. 1991, 69, 648.

(83)

Karimi, H.; McAuley, K. B. A Maximum-Likelihood Method for Estimating Parameters, Stochastic Disturbance Intensities and Measurement Noise Variances in Nonlinear Dynamic Models with Process Disturbances. Comput. Chem. Eng. 2014, 67, 178.

(84)

Bouriga, M.; Féron, O. Estimation of Covariance Matrices Based on Hierarchical InverseWishart Priors. J. Stat. Plan. Inference 2013, 143, 795.

(85)

Petersen, K. B.; Pedersen, M. S. The Matrix Cookbook. Citeseer 2007, 16, 1.

ACS Paragon Plus Environment

55

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Input trajectories for nonlinear CSTR (Varziri, Poyton, McAuley, McLellan, & Ramsay, 2008)

116x118mm (120 x 120 DPI)

ACS Paragon Plus Environment

Page 56 of 58

Page 57 of 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 2. Measured, true, and predicted concentration and temperature responses for the ABEM method for one dataset in scenario A (• simulated data, ----- response with true parameters and true stochastic noise, ___ABEM response). 128x66mm (120 x 120 DPI)

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TOC 75x56mm (120 x 120 DPI)

ACS Paragon Plus Environment

Page 58 of 58