# Increasing the Accuracy of Equilibrium Constants With Nonlinear Curve Fitting

*By Dr. Ernö Keszei, Professor of Physical Chemistry, Eötvös University, Budapest, Hungary*

**Table of Contents**

Introduction

Spectrophotometric Analysis

Beer's Law

Nonlinear Method

Method Of Least Squares

Origin

Results

The accuracy that can be achieved in measuring equilibrium constants with spectrophotometry can be substantially increased by using nonlinear curve fitting to analyze the results. Chemists have used spectrophotometry to infer equilibrium constants for decades but nearly always have used simplified linearized treatments to evaluate the experimental data. Thanks to the widespread availability of sophisticated mathematical software packages, nonlinear parameter estimation has become an easy task. The availability of a software package, Origin 5.0 from Microcal Software, Inc., Northampton, Massachusetts, that can fit curves to multiple functions simultaneously, makes it possible to evaluate an entire spectrophotometric spectrum to dramatically improve accuracy relative to current methods. This technique is particularly useful in determining the activity of enzymes and other biologically active compounds and elucidate the environmental role of many chemical substances.

Spectrophotometric analysis is based on the absorption or attenuation by matter of electromagnetic radiation of a specified wavelength or frequency. The radiation interacts with specific features of the molecular species being determined, such as the vibrational or rotational motions of the chemical bonds. The radiation can also interact with specific atoms or the whole molecule by causing the molecule to change its electronic energy state. The amount of radiation absorbed by the sample being analyzed is a measure of the concentration of the chemical species being probed. The region of the spectra most useful for chemical analysis is that between 200 nanometers and 300 micrometers.

Chemists have long used spectrophotometry to gain information on the equilibrium concentrations of a liquid reaction mixture. A paradox of this effort was that they usually did not aim at maximum information concerning the equilibrium constant but rather tried to fulfill peculiar conditions to be able to use one of the numerous simplified treatments to evaluate experimental data. The common feature of all these simplifications is to select and manipulate experimental data in such a way as to remove the essentially nonlinear nature of the related model equations so that only linear models need to be treated to calculate the equilibrium constant. Several researchers have proposed methods of non-linear parameter estimation for chemical equilibrium applications however they required that the user be capable of programming in a low level language such as FORTRAN.

Spectrophotometric determination of the equilibrium constant is based upon the fact that the absorbance of an equilibrium mixture undergoing the reaction D + X = DX at a wavelength l can be written in terms of Beer's Law:

The brackets stand for equilibrium concentrations of compounds D and X and the complex DX, is the spectrophotometric absorbance, *l* is the optical path length and is the molar absorptivity of the absorbing species. When applying a straightforward method to determine the equilibrium quotient *Q*, we apply a nonlinear parameter estimation algorithm to estimate and *Q* as parameters of the function.

While */l* is a linear function of -s it is a nonlinear function of *Q*, hence the need for nonlinear parameter estimation.

In the above case there are four parameters to determine, so parameter estimation can be efficient as long as the number of data points exceeds 20. Since it is difficult to prepare so many mixtures, normal practice is to measure at several wavelengths. If we have 10 mixtures, measuring their absorbances at four different wavelengths results in 13 parameters but provides 40 measured experimental points. The number of degrees of freedom is 26 for 10 mixtures at 4 wavelengths compared to only 15 for 20 mixtures at a single wavelength. Normally, larger numbers of degrees of freedom mean greater accuracy in parameter estimation.

However there is another way to increase the precision of estimation. If we suppose some absorption band profile, such as Gaussian or Lorentzian, for the three species involved, then we have three parameters; the position of the absorption maximum, the maximum absorption or area of the band and the width of the band, for each absorbing species plus the equilibrium quotient. This is ten parameters all together but the number of parameters does not increase if we measure absorbances at several wavelengths. With the above example, measuring 10 mixtures at four different wavelengths means 40 data points and ten parameters, yielding 29 degrees of freedom, compared to 26 when we ignored the absorption band shape. Of course there is no limit for increasing the number of test wavelengths. When doing so, we not only determine the equilibrium quotient to a high degree of precision, but also get the complete absorption spectra of all three species involved, as long as they have nonzero absorptivities in the observed wavelength region. Using linearized models it is impossible to fit such band profiles.

There is another advantage of using this straightforward nonlinear method with absorption band parameters and a large number of data points at many test wavelengths. If there is a possibility of additional species with additional equilibria, we can easily rewrite our model including the absorption parameters of additional species, with the appropriate solution of the corresponding equilibrium concentrations and check for the improvement of fit due to the more complicated model.

The most widely used method to fit a curve to nonlinear data is called the method of least squares. According to this, if we have data points *A*_{obs,i} and a function to model the data which gives the fitted values at each data point *A*_{calc,i}, we get optimal fit of the curve - and hence optimal values of the parameters involved in the model functions - if we minimize the sum:

which is called the weighted residual sum of squares. The summation goes over all the *m* measured points. The factor *w*_{i} is the so-called weighting factor. If we choose *w*_{i} so that it is proportional to 1/*s*^{2}(*A*_{obs,i}), the inverse of the variance at each measured point, we get the best kind of estimates called minimum variance unbiased. Most commercial software packages do not have the option to enter individual weighting factors, so they use equal weights at each data point (i. e. *w*_{i} = 1 at all *i*), which is equivalent to the approximation that the experimental error is the same at each measurement. Now, if the experimental error is really constant at each point, whenever we transform the measured data in a nonlinear way (e. g. make a nonlinear transformation of the original data to get a linearized function), we should also transform the weight function, which then becomes different at each data point. The weighting problem is easily avoided using a nonlinear parameter estimation method with the original observed values.

Most estimation methods readily provide also an estimate of the error of the parameters. However, this error indicates a measure of reliability of the parameter, which is dependent on the number of experimental points that have been used in the estimation procedure. A better measure of the reliability is the confidence interval of the parameters, which takes into account both the calculated error and the number of data points. If the standard deviation of a parameter *p* is *s*(p) - which is simply given by most software packages as "error" -, the corresponding interval at 1 - *a* level of confidence is *p*±*s*(*p*) *t*_{u}(1 - *a*/2) where *t* (1 - *a*/2) is that value of the variable *t*_{u} of a Student's distribution, at which the cumulative distribution function is exactly 1 - *a*/2. The subscript *n* is the number of degrees of freedom.

A major obstacle in converting this method into a practical procedure was locating a data analysis software package capable of fitting curves to multiple nonlinear functions simultaneously. Originally, I wrote a FORTRAN program that solved the curve-fitting problem but did not provide accompanying graphics which made it difficult to interpret the results. While there are numerous packages available that can fit a curve to a single nonlinear function, the author was able to find only one, Origin 5.0 that could handle the multiple function problem required to calculate equilibrium constants. This program offers several other major advantages as well in solving this type of problem. It has a scripting language that the author has used to automate the equilibrium calculations. For evaluation purposes, it has a demo version available on Microcal's website at www.microcal.com that provides all the power of the full version of the software for a two-week trial period.

*Origin provides simultaneous multiple nonlinear curvefitting capabilities*.

As an example of the non-linear parameter estimation method for spectrophotometric data, the author re-evaluated Ramette's calculation of equilibrium quotient for the simple complex-forming reaction between Fe^{3+} (substance D), and SCN^{-} (substance X), keeping the original assumption that a 1:1 complex ion is the only product in a perchloric acid medium for this reaction at the wavelength measured (450 nm) and the only absorbing species is FeSCN^{2+}, (substance DX). The graph of the fitted model function is shown below, where the excellent quality of the fit is obvious.

*Results of duplicating Ramette's calculation.*

The nonlinear method also provides reasonable estimates of the errors and confidence intervals. As we can see from the results shown in the table below, increasing the number of experimental data points from 7 to 21 results in decreasing the half width of the confidence interval of the equilibrium quotient from 15 to 6, compared to a Q of 131.0, i. e., from a relative error of 12% to 4.6 %. When performing the non-linear fit at each single wavelength separately, the respective half-widths were 13, 15 and 11. This means that the decrease is really due to the increase of the degrees of freedom with increasing number of data points. The estimated correlation of the equilibrium quotient to the estimated molar absorptivities of the complex is -0.95 at all the three wavelengths, which explains the relatively high uncertainty of the estimated value of *Q*.

Nonlinear parameter estimation using the methods described above is easy and far more accurate than existing linearized methods. The versatile nonlinear fitting capabilities of Origin software make it ideally suited for this type of data analysis. This method is expected to prove extremely useful in determining the activity of enzymes and other biologically active compounds and evaluating the structure of certain inorganic molecules such as metal-ligand complexes.

**References**

- Ramette, R.W.A.,
*Journal of Chemical Education*, 44, 647, (1967) - Bevington, P.R.:
*Data Reduction and Error Analysis for the Physical Sciences*, McGraw-Hill, New York (1969)

For more information on Origin, contact: Microcal Software, Inc., One Roundhouse Plaza, Northampton, MA 01060. Telephone: 800-969-7720 X36. Fax: 413-585-0126.

*
*