The basic idea of latent semantic analysis (lsa) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular value decomposition (a two-mode factor analysis) over a given document-term matrix, this variability problem can be overcome.
Publication: Fridolin Wild, Christina Stahl (2007): Investigating unstructured texts with latent semantic analysis, In: Decker, Lenz (Eds.): Studies in Classification, Data Analysis, and Knowledge Organization, Springer, pp. 383-390, link.