Multivariable "structural" models of development

Table of Contents

Multivariable "structural" models of development
Initial notes
Annotations on common readings
Annotated additions by students
Idea: Just as standard regression models allow prediction of a dependent variable on the basis of independent variables, structural models can allow a sequence of predictive steps from root ("exogeneous") variables through to highest-level variables. Although this kind of model seems to illuminate issues about factors that build up over the life course, there are strong criticisms of using such models to make claims about causes.

Initial notes

Cases: Kendler et al. 2002 on pathways to depression in women: Notice the high R^2 and the way the authors tease out different kinds of pathways to depression from the model they fit to their data.
Freedman 2005 is a statistician who questions whether structural models can be thought of as causal models and tries hard to make his questioning accessible (i.e., with a minimum of technical language [not zero however]).
Ou's 2005 synthesis of pathways from pre-school programs to later outcomes: Notice the different kinds of networks Ou reviews in the literature before presenting her own analysis.
Friedman and Rossi 2011 is included as a counterpoint to the idea that structural models are models of complex causal interactions in epidemiology.

During the class, we might look first at Kendler's and Ou's diagrams,
then do Q&A on the technical aspects of path analysis and SEM primed by the notes below,
then work our way through Freedman's critique.

PT's first attempt at a non-technical introduction to path analysis and structural equation modeling (alternatives expositions welcome)

Path analysis is a data analysis technique that quantifies the relative contributions of variables (“path coefficients”) to the variation in a focal variable once a certain network of interrelated variables has been specified (Lynch & Walsh 1998, 823). Some of these contributions are direct and some mediated through other variables, i.e., indirect. Although some researchers interpret “contribution” in causal terms (e.g., Pearl 2000, 135 & 344-5), others criticize such an interpretation (e.g., Freedman 2005). Here, contribution refers neutrally to the term of an additive model fitted to data.

The conceptual starting point for path analysis is an additive regression model that associates the focal (“dependent”) variable with several other measured (“independent” or “exogenous”) variables. (The vertical lines in these figures indicates that the separate horizontal lines are combined together.)

X1 ----|
X2 ----|----> Y
X3 ----|

Technically, the additive model is transformed by subtracting the mean from every term, squaring the expression (so it is an equation for the variance), and dividing by the variance of the focal (“dependent”) variable. The result is the “equation of complete determination,” with the regression coefficients being multiplied by the SD of the other “independent” variables and divided by the SD of the focal variable to arrive at the path coefficient.

The next step is to consider more than one focal, “endogenous” variable and networks of exogenous and endogenous variables that you have reason to think are associated with one another. Indeed, the focal variable of one regression may be among the variables associated with a second focal variable and so on. In the figure below X3 has a direct link with Y2 and an indirect one through Y1.

X1 ----|
X2 ----|----> Y1 -|--> Y2
X3 ----|------------|

The software (e.g., LISREL) can solve these linked regression equations, but it is up to you to compare the results using the network you specify with plausible (theoretically-justified) alternatives that may link exogenous, independent variables and endogenous variables differently. Unlike multiple regression, we do not arrive at our idea of what should be in the regression by adding or subtracting variables in some stepwise procedure.

Structural equation modeling extends path analysis to include latent (a.k.a. unmeasured) variables or “constructs.” These latent variables are sometimes the presumed real underlying variable of which the measured one is an imperfect marker. For example, birth weight at full term and the neonate APGAR scores might be the measured variables but the model might include degree of fetal under-nutrition as a latent variable. Latent variables can also be constructed by the software in the same way that they are in factor analyses, namely, as economical (dimension-reducing) linear combinations of measured variables. Calling the networks of linked variables “structural” is meant to suggest that we can give the pathways causal interpretations, but SEM and path analysis has no trick that overcomes the problems that regression and factor analyses have in exposing causes.

This section is not needed for understanding the papers for this week. However, looking ahead to studies of heritability (part of week 12), a field in which path analysis originated, there are no measured variables except the observed focal variable (e.g., height). Path analysis can still be used if we convert the additive model on which any given Analysis of Variance (AOV or ANOVA) is based into an additive model of constructed variables that take the values of the contributions fitted to the first model. For example, in an agricultural evaluation trial of many varieties replicated one of more times in each of many locations, the AOV model is

Yijk = M +Vi +Lj +VLij +Eijk (eqn. 1)

where Yijk denotes the measured trait y for the ith variety in the jth location and kth replication;
M is a base level for the trait;
Vi is the contribution of the ith variety;
Lj is the contribution of the jth location;
VLij is an additional contribution from the i,jth variety-location combination—in statistical terms, the “variety-location-interaction” contribution; and
Eijk is a noise contribution adding to the trait measurement.

The path model equivalent to equation 1 is
Yx = M +Z1x +Z2x +Z3x +Ex (eqn. 2)

Y is the measured trait as before and x denotes the replicates
Z1x = Vi if x if a replicate of variety i, or 0 otherwise
Z2x = Lj if x if a replicate in location j, or 0 otherwise
Z3x = VLij if x if a replicate of variety i in location j, or 0 otherwise
Ex = Eijk where x is replicate k of variety i in location j

The path coefficients are then set to equal the square root of the ratio of the variance of the contribution (Vi, etc.) to the total variance for the trait (Y). The equation of complete determination becomes
1 = Sum (over w's) of variance (Zw) / var(Y) (eqn. 3)
where w denotes the different contributions in the Analysis of Variance model.

For the agricultural trial this equation might be written
1 = [var(V) + var(L) + var(VL) + var(E)] / var(Y) (eqn. 4)
where V = variance of the vi terms, etc.

In human studies the var(VL) is ignored or discounted [which I think is a problem, PT] and this is expressed as
1 = heritability + shared environmental effect + non-shared environmental effect (eqn. 5)

When the same trait is observed in two relatives, their separate path analyses can be linked in one network and the correlation between the relatives calculated (Lynch & Walsh 1998, 826)—provided it is assumed that the contributions (and path coefficients) apply to both and that the noise contributions are uncorrelated. If we have data on correlations for different kinds of relatives (e.g., identical vs. fraternal twins), we can estimate the relative size of the contributions in equations such as 4 and 5. That’s the crux of heritability studies.

Freedman, D. A. (2005). Linear statistical models for causation: A critical review. Encyclopedia of Statistics in the Behavioral Sciences. B. Everitt and D. Howell. Chichester, Wiley.
Lynch, M. and B. Walsh (1998). Genetics and Analysis of Quantitative Traits. Sunderland, MA, Sinauer.
Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge, Cambridge University Press.

Notes and annotations from 2007 course, 2009
Common readings and cases: Kendler 2002 (pathways to depression in women), Freedman 2005 (Structural models as causal models?)
Supplementary Reading: Chandola 2006, Ou 2005, Rini 1999

Annotations on common readings

Annotated additions by students

(In alphabetical order by author's name with contributor's initials and date at the end.)

Christine Killingsworth Rini, Christine Dunkel-Schetter, Pathik D. Wadhwa and Curt A. Sandman. Psychological Adaptation and Birth Outcomes: The Role of Personal Resources, Stress, and Sociocultural Context in Pregnancy. Health Psychology 1999, Vol. 18, No. 4, 333-345

In the United States, adverse birth outcomes (PTD and LEW) occur in a substantial percentage of live births and are the major causes of perinatal, neonatal, and infant mortality and morbidity. However, it is noted that despite considerable research attention the underlying etiology of PTD and LEW is still poorly understood. Previous studies have observed that the ability to adjust to the conditions wrought by pregnancy holds consequences for the well-being of the parent and the developing baby. Rini and colleagues argue that comprehensively understanding psychological adaptation during pregnancy and its effects on birth outcomes requires evaluating numerous factors that are potentially implicated in prenatal adaptation. Significantly higher rates of adverse birth outcomes have been noted for women who reported experiencing more prenatal stress and anxiety during their gestation.

This prospective study investigated prenatal psychosocial predictors of infant birth weight and length of gestation in 120 Hispanic and 110 White pregnant women. Utilizing structural equation modeling, their hypotheses were that personal resources (mastery, self-esteem and optimism), prenatal stress (state and pregnancy anxiety), and socio-cultural factors (income, education and ethnicity) would have different effects on birth outcomes. Rini and colleagues found that women with stronger resources had higher birth weight babies and that reporting more stress was associated with shorter gestations. Additionally, resources were associated with lower stress, being married, being White, having higher income and education, and giving birth for the first time. They found no evidence that resources buffered the effects of stress. Furthermore, their results have been proposed to suggest that ethnicity is related to several variables that influence adaptation during pregnancy and thus it influences birth outcomes indirectly. The association of ethnicity to infant birth weight was said to be mediated by Hispanics' lower levels of personal resources. Rini and colleagues assert that their study provides evidence that resources influence birth outcomes both directly and indirectly and thus warrant further research attention.

According to Freedman (2005), strong background assumptions are required to infer causation from association by modeling. One interesting assumption employed by Rini and colleagues was that the trending to fatalism (the professed predominance of external natural and supernatural forces), “characterizing” Latin culture, probabilistically predisposes Hispanics to having lower optimism and mastery. An alternative positive association of fatalism with higher optimism and mastery is not a completely illogical assumption. A belief in predestination (fate) does not preclude (and may in fact necessitate) the possession of faith (which is the seed of hope or optimism).

The study’s small sample size impairs confidence that the final model adequately fits both ethnic groups. Furthermore, the utility of this model for real world intervention is limited as biological factors engendering adverse birth outcomes were omitted from evaluation. (SY)