Asserting coefficients and parameter scaling
Matt Bhagat-Conway
In travel demand modeling, it is somewhat common to “assert” coefficients rather than estimating them from data. I think most modelers would agree that this is a process that should be avoided when possible, but that is sometimes necessary—for example, to model mode choice for a mode that does not exist in a region yet. For example, the following guidance from the Federal Transit Administration appears in NCHRP 716:
- A typical range for the value of the in-vehicle time coefficient for home-based work trips is -0.03 and -0.02. If the coefficient falls outside the range, FTA says that “some further analysis (is) appropriate.”
- In-vehicle time coefficient for nonhome-based trips should approximately be the same as the in-vehicle time coefficient for home-based work trips.
- A typical range for the in-vehicle time coefficient for home-based other nonwork trips is 0.1 to 0.5 times the in-vehicle time coefficient for home-based work trips.
- A typical range for the coefficient of out-of-vehicle time is 2 to 3 times the corresponding coefficient for in-vehicle time. FTA believes that “compelling evidence” is needed to justify ratios outside this range.
The first one in particular presents an issue with error scaling. In a multinomial logistic regression, the probability of decisionmaker
where
Simulated situation
Here, I build three simulated variables. All are normally distributed, and are independent. Scatter plots of the variables are shown below:
I construct a binary outcome such that the probability of a positive outcome is determined by a binary logistic regression (which is equivalent to a multinomial logistic regression with two outcomes and no generic coefficients). The true coefficient for each variable is 2, with a constant of 0.3.
I estimated three logistic regression models based on this data: one with
Variable | Model 1 | Model 2 | Model 3 |
---|---|---|---|
(Intercept) | 0.166*** | 0.209*** | 0.334*** |
x1 | 0.984*** | 1.273*** | 1.995*** |
x2 | - | 1.288*** | 2.002*** |
x3 | - | - | 1.95*** |
The coefficient values increase as more coefficients are added to the model. They only reproduce the true coefficients in Model 3. The variables are completely uncorrelated with one another by construction, so this change in coefficient size is not due to omitted variable bias, it is purely an artifact of different error term scaling in the three models. We can confirm this by running linear probability models instead of logistic regression models:
Variable | Model 1 | Model 2 | Model 3 |
---|---|---|---|
(Intercept) | 0.534*** | 0.533*** | 0.533*** |
x1 | 0.202*** | 0.203*** | 0.205*** |
x2 | - | 0.205*** | 0.205*** |
x3 | - | - | 0.201*** |
Since linear regression is not affected by the error scaling issue, we now see consistent coefficients in all three models, demonstrating that omitted variable bias is not driving the results. (The coefficients are not 2 because they are expressed in terms of probability, not utility as in the logistic regression models).
Given these differences in coefficient scaling, asserting absolute values for coefficients taken from other models is unwise—if utilities are not scaled the same, asserted coefficients will not have the expected effects.
The second and third recommendations are likewise suspect. Often, home-based work, home-based other, and non-home-based will be estimated from different models, that may have different scales, so expecting their coefficients to have some defined relationship with the coefficients in the home-based work model is problematic.
The fourth suggestion is reasonable as long as it is referring to coefficients within the same model—as within a model the scale is consistent. This gives some hope for asserting coefficients—if the asserted coefficients can be expressed in terms of other coefficients in the model (e.g. something like “one minute on a light rail vehicle has equivalent disutility as 0.9 minutes on a local bus”), the scaling issue can be avoided.
To be completely fair to the Federal Transit Administration, the guidance is not “this is the coefficient value” but rather “further analysis is appropriate.” That said, it seems many agencies interpret the FTA guidance as being something they should seek to attain in their own models, possibly through assertions, and not just something that should be investigated.