sphstat : A Python package for inferential statistics on vectorial data on the unit sphere

Data that resides on the surface of a 2-sphere is common in various scientific fields, including physics, earth sciences, astronomy, and psychoacoustics. While some tools and packages exist for performing inferential statistical tests on such data and model fitting, there is currently no comprehensive open-source Python package that implements these tests. sphstat aims to fill this gap by providing an open-source Python package that implements spherical inferential tests and some model-fitting algorithms as catalogued in the authoritative reference by Fisher et al. (1993). Due to the lack of a similar open-source Python package, sphstat has the potential to be widely used in scientific and technical fields where data on the 2-sphere emerges.


Introduction
Data on the unit sphere appear in many branches of science.For example, positions of solar flares on the sun [1], eye tracking data [2], data in earth sciences [3,4], data from neuroimaging studies [5], data obtained for terrestrial laser scanning based mapping [6] or directional responses elicited from subjects in a 3D egocentric localisation experiment [7,8] can be represented as data on the unit sphere.
Despite the ubiquitous nature of spherical data, inferential analysis thereof is a topic of ongoing research [9].Hypothesis testing on unior higher-dimensional data has been developing since at least the early 18th century [10] and more notably from 20th-century onwards [11].There are well-established open-source software such as JASP [12] and open-source Python libraries such as SciPy [13] and Pingouin [14] that implement a variety of statistical tests on linear data.
There exist open source software packages such as the R packages Directional [15] and rcosmo [16] for the analysis of spherical E-mail address: hhuseyin@metu.edu.tr.
data.However, despite developments that have been going on for quite some time [17,18] inferential statistics and modelling of data on the unit sphere are not adequately catered for by any publicly available, open-source Python package.The motivation behind the development of sphstat is based on a real need that arose while working with directional data in the spatial auditory perception domain.The present author, along with two other researchers, has collected subjective localisation data for two different loudspeaker-based 3D audio reproduction systems with the aim to understand whether the differences in median directions pointed at by the subjects for a sound source that is rendered nominally at the same direction by the two methods were statistically significant [19].catalogued by Fisher, Lewis and Embleton [20], as well as some basic plotting utilities.

Implementation
sphstat is written in Python 3 [21].Array-based calculations are written using NumPy [22] and several statistics-related functions as well as special function are imported from SciPy [13].sphstat uses SymPy [23] for the exact solutions of some defining equations, pandas [24] and openpyxl [25] for importing and accessing data, and matplotlib [26] for visualisation.sphstat is organised into several modules that align with their primary use cases (e.g.distributions, descriptive statistics, single sample tests, multiple sample tests, modelling, plotting and utility functions).sphstat documentation was built with Sphinx [27] and is available at https://sphstats.readthedocs.io.sphstat is released under the MIT License.sphstat.singlesamplemodule implements several estimation and hypothesis test methods to be applied onto a sample of observations on the unit sphere.Specifically the module includes functions that implement: 1. Tests for assessing whether the sample comes from (i) a spherical uniform distribution, (ii) a rotationally symmetric distribution, (iii) a Fisher distribution and (iv) a Kent distribution as opposed to a Fisher distribution 2. Tests that are akin to single sample t-tests for testing the distribution mean or median against a given mean or median 3. Algorithms for estimating the mean, median, confidence cone, as well as parameters for the sample under the assumptions that the sample is drawn from a general axisymmetric, Fisher or Kent distribution.4. Algorithms to estimate parameters of bimodal distributions under the assumption that the sample is drawn from a Wood distribution.
sphstat.twosample module implements tests for two or more samples of unimodal vectorial data.Specifically, the following functionalities are implemented: 1. Tests for assessing whether two or more samples have the same mean or median with or without the assumption that the samples come from a Fisher distribution 2. Algorithms for estimating the pooled mean and median for two or more samples with or without the assumption that the samples come from a Fisher distribution 3. Functions for calculating the common mean, median or concentration parameter for multiple samples sphstat.modellingmodule implements algorithms for the correlation, regression and spatiotemporal analysis of vectorial data on the sphere.Specifically, the module implements the following functionality: 1. Calculation of the correlation of two random unit vectors on the sphere given two samples 2. Calculation of the correlation of a variable with unit vectors on the sphere 3. Regression of a random unit vector on a circular variable 4. Time series analysis of vectors on the unit sphere

Omissions and limitations
Some of the tests or methods described in [20] were not included in this first major version of sphstat.More specifically, the estimation of the distribution parameters for small and great circle distributions, and methods for distributions of unidirected lines are not included in the present version.It is planned that future versions of the package will be extended to include tests and methods for axial and girdle data alongside vectorial data.Some of the tests described in [20] extensively rely on data presented as nomograms or in extensive tables, which are to be used under certain conditions (e.g. when the sample size or the concentration parameters is small).While it might have been possible to integrate these data into the package, a deliberate choice was made not to include them in this version as these typically represent edge cases with limited applicability.Still, most practical cases (e.g. for larger sample sizes) are already covered by approximate methods that are implemented as part of sphstat.Extension of the package for these edge cases is planned for future versions.
Plotting capabilities, while sufficient for most practical purposes, are rather limited since this is not the main purpose of the package.However, future versions are planned to incorporate more elaborate, possibly interactive plotting methods.

Illustrative examples
Several examples demonstrating the usage of sphstat are presented in this section.In the examples below, we use data reproduced in table form in [20].We also provide relevant references to the original papers.

Example 1: Importing, generating and plotting samples
sphstat uses the polar coordinates with the colatitute (i.e.) and longitude () angles given in the fundamental ranges of 0 ≤  < 2 and 0 ≤  ≤ , respectively.While sphstat internally uses polar coordinates, data in other commonly used representations such as Declination/Inclination and Latitude/Longitude can also be used after an appropriate conversion is carried out.Some domainspecific coordinates such as Plunge/Azimuth and other geographical or astronomical coordinates are not covered.
In its present version, readsample function in sphstat.utilscan only import data in Excel (i.e..xlsx)files.This provides, as opposed to simpler formats such as comma separated values (i.e..csv)files, the ability to store multiple samples in different worksheets of the same file.Each row will contain a single observation with the first column containing colatitute, inclination, or latitude angles, and the second column containing longitude or declination angles, respectively.The data can be either in degrees or in radians which must be specified while the sample is being imported.Individual samples can be plotted using plotdata either in Mollweide or in Lambert projections.Multiple samples can be overlaid using plotdatalist in the same Descriptive statistics of the sample can be calculated using resultants in sphstat.descriptives.These include the directional cosines, resultant vector, resultant length, mean direction and mean resultant length, that are returned in a dictionary, which can be displayed using prettyprintdict in sphstat.utils.
Listing 1 shows the basic operations of reading a sample originally stored in declination/inclination coordinates, generating two samples of 100 observations from a Fisher distribution with different mean directions and concentration parameters, calculating and displaying the resultant statistics of one of these samples, and plotting the three samples in the same figure using Mollweide projection as shown in Fig. 1.Notice that the data points are mapped to 0 ≤  ≤  and 0 ≤  < 2.The data imported in this example pertains to measurements of the direction of magnetisation in specimens from the Great Whin Sill as reported in [28].

Example 2: Analysis of a single sample
sphstat provides functions for the analysis of a single sample of observations.sphstat.singlesamplecomprises several tests and estimators.For example, it is possible to test if the sample comes from a uniform distribution or to test if the population mean is a given prescribed value.The functions in sphstat.singlesampleinclude two groups: hypothesis tests and parameter estimators.The hypothesis tests are tests for uniformity (i.e.isuniform), rotational symmetry (i.e.isaxisymmetric), Fisherianness (i.e.isfisher), a test for selecting a Kent distribution as opposed to a Fisher distribution (i.e.isfishervskent), testing against a specified mean direction (i.e.testagainstmean and meantest), median direction (i.e.testagainstmedian), or a concentration parameter (i.e.kappatest) and an outlier test (i.e.outliertest).Single sample parameter estimators include estimators for mean direction (i.e.meanifsymmetric), parameter estimators for the Fisher (i.e.fisherparams), Kent (i.e.kentparams, kentmeancone) and Wood (i.e.bimodalparams) distributions.The parameter estimators should be used after a specific model is chosen.
Listing 2 shows a pipeline for testing whether is sample can be assumed to be drawn from a spherical uniform distribution or a Fisher distribution.Once the Fisherian hypothesis is retained, the parameters of a Fisherian model for the sample is calculated, and the sample is then tested against a mean direction.The data imported in this example pertains to measurements of magnetic remanence in specimens of Palaeozoic red-beds from Argentina reported in [29].

Example 3: Analysis of two or more samples
It is often of interest to compare two or more samples for their mean or median directions.The tests iscommonmedian, iscommonmean, and isfishercommonmean are similar to independent samples ttests and multiple comparisons on more than two linear samples.
Similarly, isfishercommonkappa is akin to Levene's test [30] or the Brown-Forsythe test [31] for linear samples.The package also includes the functions to calculate the common mean (i.e.pooledmean and fishercommonmean), and the common concentration parameter (i.e.fishercommonkappa) of multiple samples.
In the code fragment given in Listing 3 uses of different functions from this module for analysing a set of four samples are shown.The data imported in this example comprises measurements of occupational judgments according to 4 different criteria (earnings, social

Listing 2
Analysing a single sample using sphstat.singlesample.status, reward, social usefulness), each response cast into a unit norm vector [32].

Example 4: Correlation, regression, temporal association
sphstat also includes modelling tools for calculating the correlation between two samples (i.e.samplecorrelation), estimating the population correlation (i.e.xcorrrandomsamples, jackknife_corrci) calculating the correlation a sample and a circular random variable (i.e.xcorrsamplevariable), fitting a regression model for a circular variable (i.e.regresscircular), and assessing whether a series of observations are temporally associated (i.e.isnottemporallyassociated).

Correlation
In many cases, samples of observations of the same phenomenon on the unit sphere are correlated and the level and direction (i.e.positive, negative or no correlation) of their correlation might be of interest.For example, it might be of interest to calculate the correlation between the eye fixations on the same target between two different subjects.
The example usage of sphstat in analysing correlation is shown in Listing 4. The data imported in this example comprises measurements of magnetic remanence after successive partial demagnetisation stages at different temperatures in specimens of Mesozoic dolerite and is tabulated in [20, Appendix B8, p287].

Regression
Another use case for sphstat.modelling is finding a linear regression model for a variable given a set of associated vector observations.regresscircular provides a regression model in the form of a function object for predicting points on the unit sphere given a circular random variable.Listing 5 provides a simple example for the usage of sphstat for this purpose where a sample from a Kent distribution is generated.The circular variable to be used as the predictor is the angle between the vector and the -axis.Fig. 2 shows the output of the code in the Listing.

Temporal association
Finally, sphstat also provides the functionality to assess whether a time ordered set of observations are temporally associated.Such data can occur, for example, in eye tracking experiments measuring fixation.
Listing 6 demonstrates the use of sphstat for assessing temporal association using data from [33] which comprises the GPS coordinates of an individual Montagu's harrier during its autumn migration in 2009.As would be expected, the result from isnotseriallyassociated indicates that we can reject the null hypothesis that the data is independent in favour of temporal association.

Impact
The sphstat package has already been used in assessing the results from a subjective localisation experiment that involved a pointing task [19].While inferential statistics (i.e.hypothesis testing) is quite common in psychoacoustics in general and audio quality evaluation in particular, the author is unaware of any other studies using a statistically grounded approach that sphstat affords.This is possibly due to the lack of software tools, open source or otherwise, that implement such functionality.sphstat would facilitate how the results of such experiments are analysed and reported.The package can also be used in topics such as room acoustics (e.g. to assess the directional properties of diffuse sound field from energetic measurements), array signal processing (e.g. to analyse direction or arrival estimates), earth sciences (e.g. for the analysis of magnetisation of different deposits), ecology (e.g. for the analysis of biodiversity), astronomy (e.g. for the analysis of data from radiotelescopes), cognitive science and psychology (e.g. for the analysis of eye tracking data).

Conclusions
This article presented sphstat which is a Python package for applying inferential statistics on vector data on the unit sphere.In its first major version, sphstat includes functions for random sample generation from a variety of spherical distributions, functions for descriptive statistics, functions for hypothesis testing on samples of vectorial data on the unit sphere, functions for calculating correlations, functions for fitting a regression model to a circular variable, and hypothesis testing for temporal association of a sample.
While the present version of sphstat implements the methods and algorithms given in [20], it is planned that it will be extended

Listing 1
figure.Both functions can also calculate and display the median and its confidence cone.sphstatcan generate samples from uniform, Fisher, Fisher-Bingham, Kent, and Watson distributions with given parameters.Note that, while Watson distribution is a line distribution, sphstat in its present version does not implement tests or estimators for axial distributions.

Fig. 1 .
Fig. 1.Plots generated using the code from Listing 1 showing the two available projections.
The development of sphstat was started to facilitate the required directional analyses.