Two-sample test for diagram representation of persistence homology data
Source:R/two-sample-diagram-test.R
two_sample_diagram_test.Rd
This function performs a two-sample test for persistence homology data using the theory of permutation hypothesis testing to test the null hypothesis that the two samples come from the same distribution. The inference is performed using test statistics that only involve distances between persistence diagrams. Hence, the input data can be either a persistence set or a precomputed distance matrix.
Arguments
- x
An object of class
persistence_set
typically produced byphutil::as_persistence_set()
or of classdist
typically produced byphutil::bottleneck_pairwise_distances()
orphutil::wasserstein_pairwise_distances()
. Ifx
is a persistence set, theny
must be either a vector of two integers (sample sizes) or another persistence set. Ifx
is a distance matrix, theny
must be a vector of two integers (sample sizes).- y
An object of class
persistence_set
typically produced byphutil::as_persistence_set()
or a vector of two integers. Ifx
is a persistence set, theny
must be either a vector of two integers (sample sizes) or another persistence set. Ifx
is a distance matrix, theny
must be a vector of two integers (sample sizes).- dimension
An integer value specifying the homology dimension to use. Defaults to
0L
, which corresponds to the 0-dimensional homology.- p
An integer value specifying the p-norm to use for the Wasserstein distance. Defaults to
2L
, which corresponds to the Euclidean distance. Ifp
is set toInf
, then the Bottleneck distance is used.- ncores
An integer value specifying the number of cores to use when computing the pairwise distance matrix between all combined persistence diagrams. Defaults to
1L
, which means that the computation is done sequentially.- B
An integer value specifying the number of permutations to use for the permutation hypothesis test. Defaults to
1000L
.- stat_functions
A list of functions that compute test statistics to be used for solving the inference problem. These functions must take two arguments: first, an object of class
dist
representing a distance matrix and second, an integer vector specifying the indices of the data points belonging to the first sample. Defaults tolist(flipr::stat_t_ip, flipr::stat_f_ip
)` which are distance-based statistics equivalent to Student's and Fisher's statistics respectively.- npc
A string specifying the non-parametric combination method to use. Choices are either
"tippett"
(default) or"fisher"
. The former corresponds to the Tippet's method, while the latter corresponds to Fisher's method.- seed
An integer value specifying the seed for random number generation. Defaults to
NULL
which uses current time.- verbose
A boolean value indicating whether to print some information about the progress of the computation. Defaults to
FALSE
.- keep_null_distribution
A boolean specifying whether the empirical permutation null distribution should be returned as well. Defaults to
FALSE
.- keep_permutations
A boolean specifying whether the list of sampled permutations used to compute the empirical permutation null distribution should be returned as well. Defaults to
FALSE
.
Value
A numeric value storing the p-value from the two-sample test where
the null hypothesis is that the two samples come from the same
distribution. If one of keep_null_distribution
or keep_permutations
is
set to TRUE
, then the output will be a list containing the p-value and
the null distribution (if keep_null_distribution
is set to TRUE
) and
the list of sampled permutations (if keep_permutations
is set to TRUE
).