PROPOSED IMPROVEMENTS to the FINMATH.XLS MS-EXCEL SPREADSHEET PROGRAM
v.0.0, v0.1 (based on discussions T.Petrie, AV.Toy Spr05)
v0.2 (based on discussions T.Petrie, AV.Toy Spr05 Feb 16)
	change summary
	reorder the numerical position of the Proposed Improvements to
	reflect the work sequence,
	  move current item 5) "Help pop-up..."
	  to immediately after current item 2)
	  move current item 7) "Password ..."
	  to immediately after current item 1)
	partition current item 4) "Create a new set of internal ..."
	into 2 work items 
	  +one item reflecting the new set of internal variables work
        +one item creating or consolidating onto one sheet
	  the setting of user inputs that characterize the experiment,
	  such as values, parameters, check boxes to a single
	  and move this item to before current item 3a).
v0.3 (based on discussions T.Petrie, Derya, AV.Toy Spr05 Mar 02)
	Add to item 3) another (4th) "button" option form in the Reference sheet
	which would clear the file contents in the folder with the prior saved
	time series. Simplifying the user interface design for editing the
	selected series.

***
INTRODUCTION:
Based on recent experiments,
the FinMath.xls spreadsheet program appears to be algorithmically correct
and a faithfull representation of the CAPM trading strategy described
in R.Roll (Journal of Financial Economics 1977v4).

However, it is difficult to generate results that can be compared across experiments due
to several operational attributes and assumptions of the program,
encumbering its utility for the analysis of related experiments.

This proposal for improvements to FinMath.xls first characterizes the problems followed
by a proposed set of changes and their associated motivation or problems they address.

***
PROBLEM CHARACTERIZATION:
The difficulties arise from 4 primary aspects of the design.

1) The various sheets are interdependent.
   They interact to both "feed" entered values and parameter settings
   (e.g. trading period length, start and end dates, reference time series, etc.)
   and perform calculations (e.g. inverse covariance matrix of a training period,
   assest weights, etc.) for each other.
   However, some values and external parameters are changed after
   a cycle through the spreadsheets.

   Unless the desired value and parameter settings are diligently re-verified
   and re-established, the results from multiple experiments cannot be compared
   to derive useful conclusions.
   Inaddition, results of a sheet calculation depend on internal variables
   set from calculations in another sheet, so to generate consistent results
   requires an ordered flow through the sheets.

   Moreover, some parameters can be set from different sheets
   (e.g. start date, trading period length, etc.).
   However, changing the parameter in a subsequent sheet may break a
   necessary association with the "feed" data from a prior sheet,
   unless one returns to the prior sheet and reinvokes the calculations.
   This situation can occur when a Reset is invoked in a subsequent sheet,
   unintentionally, changing a date value set in a prior sheet. 

   Examples:
   For a trivial example, changing the trading period length in the Dynamic
   Optimal Portfolio sheet would affect the sampling, calculations, and
   internal variable settings in both the Asset Browser and CoVariance sheets.

   Consider the following more interesting 
   Covariance Analysis and Dynamic Optimal Portfolio sheets interaction
   due to the value of the training interval.
   When the covariance matrix is determined directly in the Covariance Analysis
   sheet by Reset+Calculate (twice if necessary to clearout the stray data
   of a prior run), the values of the matrix, the profolio member weights, etc.
   reflect using ALL the samples between the Start and End dates,
   not the Training Interval.
   The Dynamic Optimal Portfolio sheet reuses the Covariance Analysis sheet to  
   perform part of its calculations and in the process overwrites many of the
   entries in the Covariance Analysis sheet.
   When the Covariance Analysis sheet calculations are invoked 
   from the Dynamic Optimal Portfolio sheet,
   the values of the matrix, the profolio member weights, etc.
   reflect using a "rolling" window of  the samples within the Training Interval
   set in the Dynamic Optimal Portfolio sheet,
   except for the last window which may use 1 less sample.
   After performing a Reset+Calculate from the Dynamic Portfolio sheet,
   the Covariance Analysis sheet will show the results of using the last
   rolling window's samples as determined by the Dynamic Optimal Portfolio sheet.

2) The design assumed a linear usage flow,
   first (Reference sheet) to last (Dynamic Optimal Portfolio sheet),
   that the datasets are not reused and the resulting spreadsheet changes
   are not saved to disk across experiments,
   and new datasets are imported at the beginning of each scenario run.

   Examples:
   The time series imported into the Reference sheet (and reflected in the
   separate individual time series spreadsheets) are not saved upon saving
   of the main spreadsheet book.
   When the save spreadsheet is re-invoked, the last results of the prior
   calculations populate the sheets, however the Reference sheet does not
   contain any time series. 

   A more complex interaction across runs of the spreadsheets involves
   how the values of the time series are populated across the sheets in
   a particular order.
 
   In the Covariance Analysis sheet,
   the problem of retained prior (stray garbage) values in rolls B8->, , , B12->
   occurs whenever fewer time series are imported (in the Reference sheet)
   for the new experiment than the number used in the prior set of analyses.
   Those stray values are applied in the subsequent calculations
   of the variance and efficiency set constants.
   The initial Reset + Calculate from the Covariance Analysis sheet
   appears to populate the CA_vectors' components based on an index count,
   set by the number of time series imported in the Reference sheet.
   It erroneously ignores any prior values located further to the right.
   The results of any calculations are invalid.

   *However, the effects are benign, if one follows this recommended procedures.*
   The values are correctly updated when at least 2 (sometimes 3, but I could
   not consistently recreate the need for 3) Reset + Calculate are performed.
   The second Reset + Calculate (possibly third) changes (and I believe
   erroneously) the index count (however, fortuitously), resulting in the
   subsequent "clean up".
   A Reset + Calculate from the Dynamic Optimal Portfolio spreadsheet
   (after a Reset + Calculate from the Covariance sheet) will also "clean up"
   the stray values.
   So for manual operation that follows the recommended procedures,
   the effects of the problem are benign.

   Note:  Going to the Dynamic Optimal Portfolio sheet and performing a    
   Reset+Calculate without first going through a Reset+Calculate (again twice
   if necessary) directly from within the Covariance Analysis sheet
   is NOT recommended.
   The results will be inconsistent because several internal variables will be
   defaulted or left at the prior saved values of the spreadsheet and these
   may be different than what are the intended values.

   These idiosyncrasies also apply to the Statistical Analysis sheet 
   interactions with the Covariance Analysis sheet. 

3) The Statistical Analysis sheet operates as an control overlaid,
   selectively re-ordering subsets of the time series, values, and parameter
   settings and invoking the various other sheets to perform calculations
   multiple times.
   It assumes that a) certain parameter settings are persistent or
   re-established after each set of calculation, b) the order of the time
   series will not affect the calculations, and c) the number of time series
   involved from a prior set of calculations has no affect on the calculations
   of a subsequent set.  All these assumptions are invalid.

   Example:
   Sometimes a much more volatile stock (in this example, INTC)
   forces a parameter boundary constraint to be "hit",
   such as the maximum weight of an asset as calculated for particular
   training period. When such constraints become "active", the sequence 
   order in which the time series are imported will affect the results.
   Moreover, the degree of effect may depend on other parameter settings.

   The period is 02/27/01 through 01/28/05.
   The parameter settings are:
   Trading interval 5 days, Training interval 20 periods (100 days),
   Risk free return 2.5%, Fixed desired return 0.5%, and Optimize for Return.
   From the Asset Browser sheet, the period volatility are
   OIH  4.78%,  AIG 4.17%,  VAR 4.05%,  INTC 6.69%.

   Time series order OIH, AIG, VAR, INTC
   (in fact for any case where INTC is not first)
   BM weight =< 200%  portfolio volatility  3.48% with end value 1.069
                100                         1.84
                 50                         0.70

   Time series order INTC, OIH, AIG, VAR
   (in fact for any case where INTC is first)
   BM weight =< 200%  portfolio volatility  3.20% with end value 2.065
                100                         1.66
                 50                         0.78

   So the maximum weight constraint seems to have the desired effect.

   However, the parameter setting of the Risk free return has a greater
   influence on the results as the boundary constraint on the asset weights
   becomes more stringent.

   With a risk free return at 5% and the BM adjust limited to 50%,
   the order (leading INTC vs not) does not impact the end value or volatility
   because the 5% dominates.
   If the risk free return is dropped to 2.5% and the BM adjust limited to 50%,
   the order (leading INTC vs not) has a slight effect,
   case: leading INTC    end value 1.267  portfolio volatility 0.78%
         non ld  INTC    end value 1.237  portfolio volatility 0.70%

4) While the CAPM strategy is represented algorthmically correct,
   the program assumes that the user is familiar with the applicable theory
   and numerical analysis limitations and that the select set of time series
   imported are self-consistent.

   Examples:
   An Excel VBA Error 13 problem due to a type match problem of DOP_bal
   can occur that is indeed correct algorithmic operation.
   Poor user interface, but valid.

   The inverted covariance matrix (R2xR2) of the Covariance Analysis sheet,
   defines the "efficient" hyperbola bounding the (return, sigma) set of the
   market component.
   In one experiment, just mentally estimating the volume of the inverted
   covariance matrix by multiplying using only the 2 significant digits per
   element displayed, results in a size of ~1.5.
   After expanding the columns, reformatting displayed digits to the
   right of the decimal point to 6 digits and recalculating,
   the size collapses to 0.002 or less. The volume of the matrix collapsed.
   This error teaches me not to ignore a numerical analysis rule of thumb
   when it was obvious that the range between the smallest and largest elements
   was 5 orders of magnitude.

   The collapse is due to the use of single stock.
   Since the Covariance Analysis sheet follows R.Roll's formulation, 
   with a single stock, all linear combinations of the stock are correlated,
   so the eigenvectors collapsed.
   The covariance matrix must be positive semi-definite for the quadratic
   optimization problem to work.

   From the formulation, it is also necessary to watch out for cases where
   the expected returns of each of the stocks are very close in absolute value
   to each other within any given training period, again correlated.
   It is possible to construct a multiple stock portfolio to have separated
   linear combinations that are very highly correlated (e.g. linear combinations
   of a subset of the time series is correlated to the non member (of the subset)
   time series) resulting in a collapse.
   Such a portfolio is unlikely to occur in normal practice.
   However, since the estimate of the covariance matrix is determined by the subset
   of returns in a rolling window sized by the Training Interval length parameter.
   As the Training Interval length is reduced, a collapse could occur because the
   returns are correlated in that shorter interval which spans a Training Interval
   length.  In practice with 3 time series, even when the time series are from the
   same industry sector, such a Training Interval related collapse usually does not
   occur in Training Interval lengths of more than 30 days (6 x 5 trading day).

   Another example occurs when there are missing trading day samples in one of
   the time series due to corporate conditions, SEC stop limits, market maker
   declarations of halting of trading, etc.) relative to the other series.
   The program collapses (shifts) the samples in that particular series
   to close the gap.  However, the resulting time series is nolonger correctly
   cross-correlated against the other time series.
   *However, experiments appear to show that for gaps of 1-2 days when the
   training intervals are large (~50days) and the overall portfolio period
   extends across more than 2 training interval in length,
   there is no practical effect on the portfolio results.*


***
PROPOSED IMPROVEMENT:

1) Remove or disable the Statistical Analysis sheet.
   Motivation:
   The reason is more practical, than technical.
   For consistent results, the operation of the other sheets would have to be
   made considerably more robust against changes to in the number of time series,
   to the order of the time series, and to non-linear invocation sequences
   of the sheets.  Insufficient resources (staff X time) exist.
   The lost of this sheet would only affect a small class of experiments,
   which if necessary could be performed manually.


2) Password protect (read, edit) the module VBA macros, functions, and
   subroutines.
   Motivation:
   Produce more consistent results due to reducing the opportunity
   for accidental changes to the macros, functions, and subroutines.
   The trade-off is that when errors occur, it will be more difficult to
   replicate and resolve.
   [I am not sure about the property rights standing for this work,
   if you wish I can talk to Vince at Rutgers OTLAP.]

 
3) Save datasets imported across cycles in the use of the spreadsheets
   and across cycles in FinMath.xls exit and reinvocations.
   Motivation:
   This improvement would relieve the burden of remembering and re-importing
   the particular set of time series desired, before an related experiment
   could be performed.  Since each imported series populates a separate
   "sub" sheet external to the book, a possible implementation could consist
   of saving those sheets into a folder.  Upon re-invocation, the contents
   of the folder would be checked and if not-null, those sheets containing the
   prior imported series would be used to repopulate the Reference and Asset
   Browser sheets. A 3rd separate "button" option form in the Reference sheet
   to select the option may be desirable.
   Inaddition, a 4th separate "button" option form in the Reference sheet may
   be also desirable.  Upon selecting this "button" option, the file contents
   in the folder with the prior imported and saved series would be cleared.
   Creating this "button" simplifies the design of the user interface by
   removing the need for developing an editing procedure for deleting the
   associated file contents whenever a selected series reference is cleared
   or cut in the Reference sheet.


4) Consolidate onto a single sheet the user input settings for all the values,
   parameters, check boxes, etc. that characterize the experiment.
   Reducing the flexibility of setting the various parameters from multiple
   sheets to a single designated sheet, especially the start date, end date
   and the trading interval length.
   Motivation:
   Presently, it is not always clear that the displayed value of a parameter
   represents the "set" value or the result of modifications during calculations.  
   This improvement would increase the consistency necessary when trying to
   associate the results across sheets when the value of the parameters
   have been changed in one sheet.


5) "Help pop-up" boxes for each of the sheets.
   Motivation:
   By identifying the purpose of the sheet and which parameters, if any,
   may be set within a given sheet and whether
   or not the new setting will require re-calculations in other other sheets,
   more consistent results would be obtained for the casual user and new student.

  
6a)Eliminate the need for the explicit double Reset + Calculate 
   used to clear the residual / spurious data from prior saved executions
   in the Covariance sheet.
   Motivation:
   This improvement would reduce the occurrence of inconsistent results
   due to an oversight in manually cleaning up the Covariance sheet via
   a double Reset + Calculate.
   With the removal or disabling of the Statistical Experiments sheet,
   it is probably possible to apply the "clear" method on the activate sheet
   when entering the Covariance sheet the first time.
   An alternative is to implicitly invoke a Reset + Calculate upon first
   activation of the sheet.
   However, it is important not to invoke multiple Reset + Calculate's
   after the first instance of the first training period calculation from
   the Dynamic Optimal Portfolio sheet.  Otherwise the values for the rolling
   window will not be calculated correctly due to the resetting of the start
   and end dates.

6b)Create a separate Covariance2 sheet as a server for the Dynamic Optimal
   Portfolio sheet calculations.
   Motivation:
   This improvement breaks the need for a linear flow whenever new values for
   Dynamic Optimal Portfolio are set.  This sheet would be cleanly repopulated
   when values from the time series and the Dynamic Optimal Portfolio sheets
   upon invocation from the Dynamic Optimal Portfolio sheet.  


7) Create a new set of internal variables (initialized to the associated
   external parameters settings) to represent the parameters when they
   are modified during calculations, rather than using the external parameter
   variable itself. 
   Motivation:
   Presently, it is not always clear that the displayed value of a parameter
   represents the "set" value or the result of modifications during calculations.  
   This improvement would increase the consistency necessary when trying to
   associate the results across sheets when the value of the parameters
   have been changed in one sheet.
   For example, the Covariance sheet and the Covariance2 sheet, if implemented
   would use a different set of internal variables to designate the span of the
   rolling window of the training period, rather than changing the start date
   and end date set by the user.)


8) Fill the values for a missing trading date gap in a time series
   by replicating the trading data of the prior date.
   (already implemented in version mod_3)
   Motivation:
   Compared to the current shift up collapsing of the time series samples,
   the results should be more accurate.
   However, the change knownly biases the mean of the affect series slightly
   toward the replicated data value and also changes the cross correlation,
   though less so than the current implementation.
   Experiments appear to show that for gaps of 1-2 days when the
   training intervals are large (~50days) and the overall portfolio period
   extends across more than 2 training intervals in length,
   the effect on the asset weights is <0.1% when a single trading gap is
   adjusted.  There are no practical effect on the portfolio results.

   If a trading gap of say 5 days occurs relative to the other time series
   in the set, then the stationary assumption is most likely violated
   and the CAPM trading strategy is nolonger applicable as defined.

9) Plot the determinant of the CoVariance matrix on the Optimal Portfolio Chart.
   Motivation:
   This information would help identify potential intervals where the CAPM trading
   strategy may have failed to generate the optimal results.