Geometric and Algebraic Models for Statistical Data Collection

Mar 19, 2021 | Methodology

Geometric and algebraic models attempt to describe the underlying structural relationships between variables. In some cases, they are part of a probabilistic approach. For example, algebraic models underlying regression or geometric representations of correlations between items in a technique called factor analysis.

In other cases, geometric and algebraic models are developed without explicitly modeling the element of randomness or uncertainty that is always present in the data. Although this latter approach to social and behavioral science problems has been less researched than probabilistic, there are some advantages in developing structural aspects independent of statistics. We begin the discussion with some inherently geometric representations. Then we move on to numeric representations for sorted data.

Geometry and Geometric Representations

Although geometry is a great mathematical topic, little of it seems to be directly applicable to the types of data found in the social and behavioral sciences. According to Firestein (2016), one of the main reasons is that the primitive concepts normally used in geometry – points, lines, coincidences – do not naturally correspond to the types of qualitative observations. It is special those that are usually obtained in the contexts of the social and behavioral sciences.

However, since geometric representations are used to reduce bodies of data, there is a real need to develop a deeper understanding. Especially when such representations of social or psychological data make sense. In addition, there is a practical need to understand why geometric computing algorithms, such as multidimensional scaling algorithms, work as well as they apparently do. A better understanding of algorithms will increase the efficiency and suitability of their use. This becomes increasingly important with the widespread availability of scaling programs for microcomputers.

Scaling Models

Over the past 50 years, several types of well-understood scaling techniques have been developed. They have been widely used to aid in the search for adequate geometric representations of empirical data. According to National Academies of Sciences Engineering, & Medicine (2017), the entire field of scaling is now entering a critical juncture in terms of the unification and synthesis of what once seemed to be disparate contributions.

In recent years it has become clear that several of the main methods of analysis, including some based on probabilistic assumptions, can be unified under the rubric of a single generalized mathematical structure. For example, approaches as diverse as non-metric multidimensional scaling, principal component analysis, factor analysis, correspondence analysis, and logarithmic-linear analysis have recently been shown to have more in common in terms of underlying mathematical structure than previously thought.

Multidimensional Scaling

Non-metric multidimensional scaling is a method that begins with data on the ordering established by the subjective similarity (or closeness) between pairs of stimuli. The idea is to embed the stimuli in a metric space. That is, a geometry with a measure of distance between points. In this way the distances between the points corresponding to the stimuli show the same ordering as the data.

This method has been successfully applied to phenomena that, for other reasons, are known to be described in terms of a specific geometric structure. Such applications were used to validate the procedures. This validation was done, for example, with respect to the perception of colors. They are known to be described in terms of a particular three-dimensional structure known as the Euclidean color coordinates. Similar applications have been made with Morse code symbols and spoken phonemes. The technique is now used in some biological and engineering applications. Also in some social sciences, as a method of exploration and simplification of data.

The Axiomatic Base

A question of interest is how to develop an axiomatic basis for various geometries using an observable as a primitive concept. For example, the subject’s ordering of the relative similarity of one pair of stimuli to another, which is the typical starting point of such scaling. The general task is to discover sufficient qualitative data properties to ensure that there is a mapping in the geometric structure and, ideally, to discover an algorithm to find it.

Some work of this general kind has been carried out. For example, there is an elegant set of axioms based on the laws of color correspondence that gives rise to the three-dimensional vector representation of chromatic space. But the more general problem of understanding the conditions under which multidimensional scaling algorithms are appropriate remains unresolved. In addition, it is necessary to work on understanding more general, not Euclidean, spatial models.

Ordered factorial systems

A common type of structure in the sciences arises when an ordered dependent variable is affected by two or more ordered independent variables. This is the situation to which regression and analysis of variance models typically apply. It is also the structure that underlies known physical identities. Here the physical units are expressed as products of the powers of other units. For example, the energy has the unit of mass by the square of the unit of distance divided by the square of the unit of time.

There are many examples of such structures in the social and behavioral sciences. One example is the ordering of commodity package preferences – collections of various quantities of commodities – which can be revealed directly. It can be through preference expressions or indirectly by choosing between alternative sets of packages. Another related example is that of preferences between alternative courses of action involving various outcomes with varying degrees of uncertainty. This is one of the most researched problems due to its potential importance in decision-making.

A psychological example is the trade-off between the delay and the amount of the reward. This produces those combinations that are equally reinforcing. In a common and applied type of problem, a subject is given descriptions of people in terms of several factors. For example, intelligence, creativity, diligence and honesty, and you are asked to rate them according to a criterion such as suitability for a particular job.

Data Regularities and Numerical Representation

In all these cases and in a myriad of similar ones, the question is whether the regularities of the data allow for a numerical representation. Initially, three types of representations have been studied in detail. These are the dependent variable as a sum, a product, or a weighted average of the measures associated with the independent variables. The first two representations are the basis of some psychological and economic research, as well as a considerable part of physical measurement and modelling in classical statistics. The third representation, the mean, has proven to be very useful for understanding the preferences between uncertain outcomes and the amalgam of verbally described traits, as well as some physical variables.

For each of these three cases — sum, multiply, and average — researchers know what properties or order axioms the data must satisfy for that numerical representation to be adequate. Starting from the basis that one or the other of these representations exists, and using the numerical valuations of the subjects instead of the ordering, a scaling technique called functional measurement has been developed and applied in several areas. It refers to the function that describes how the dependent variable relates to the independent ones. What remains problematic is how to cover at the ordinal level the fact that a certain random error is intruded into almost all observations. Next, you should show how that randomness is represented at the numerical level. This remains an unresolved and challenging investigation issue.

Evolution of representations

In recent years, considerable progress has been made in understanding certain representations that are intrinsically different from those just discussed. The work has focused on three related aspects. The first is a scheme of classifying structures according to the degree of restriction of their representation. Also, the three classical numeric representations are known as ordinal scale, range, and ratio types.

For systems with continuous numerical representations and scale type at least as rich as reason, it has been shown that only one additional type can exist. A second impulse is to accept structural assumptions, such as factorial ones. Then derive for each scale the possible functional relationships between the independent variables. And the third impulse is to develop axioms for the properties of an order relation that lead to possible representations. Much is now known about possible non-additive representations. This is both in the case of multifactors and in the case of combinable stimuli, such as the combination of sound intensities.

Clustering Models

Many themes do not seem to be represented correctly in terms of distances in a continuous geometric space. Rather, in some cases, such as the relationships between the meanings of words – which are of great interest in the study of memory representations – a description in terms of hierarchical tree-shaped structures seems to be more enlightening. According to Bouter et al (2016), this type of description seems appropriate both because of the categorical nature of the judgments and because of the hierarchical rather than compensating nature of the structure.

Individual elements are represented as the terminal nodes of the tree, and groupings by varying degrees of similarity are displayed as intermediate nodes, and more general groupings appear closer to the root of the tree. Grouping techniques that require considerable computing power have been developed and are being developed. There are some successful applications, but they are expected to be further refined.

Network models

Recently, other lines of advanced modelling have been developed that open up new possibilities for empirical specification and verification of various theories. In social media data, the relationships between units, rather than the units themselves, are the main objects of study: friendships between people, commercial ties between nations, co-citation groups between researchers, interrelationship between boards of directors of companies. Special models for social media data have been developed in the last decade. These provide, among other things, new accurate measures of the strength of relational links between units. One of the main current challenges of social media data is to manage the statistical dependence that arises when the sampled units are related in a complex way.

Our specialists wait for you to contact them through the quote form or direct chat. We also have confidential communication channels such as WhatsApp and Messenger. And if you want to be aware of our innovative services and the different advantages of hiring us, follow us on Facebook, Instagram or Twitter.

If this article was to your liking, do not forget to share it on your social networks.

Bibliographic References

National Academies of Sciences Engineering, & Medicine (2017) Fostering Integrity in Research (The National Academies Press, Washington, DC).

Bouter LM, Tijdink J, Axelsen N, Martinson BC, ter Riet G (2016) Ranking major and minor research misbehaviors: Results from a survey among participants of four World Conferences on Research Integrity. Res Integr Peer Rev 1:17.

Firestein S (2016) Failure: Why Science Is So Successful (Oxford Univ Press, Oxford).

You may also be interested in: Probabilistic Models for Statistical Data Collection

Geometric and Algebraic Models for Statistical Data Collection

Geometric and Algebraic Models for Statistical Data Collection

Geometry and Geometric Representations