Galclaim: A tool to identify host galaxy of astrophysical transient sources

The Galclaim software is designed to identify association between astrophysical transient sources and host galaxy by computing the probability of chance alignment. It is distributed as an open source Python software. It is already used to identify, confirm or reject host galaxy candidates of GRBs and to validate or invalidate transient candidates in astrophysical observations. Such tools are also very useful to characterise archived transient candidates in large sky survey telescopes.

Identifying galaxies that host transient phenomenons is useful to investigate their environment, provide clues about their formation and the astrophysical condition necessary for their formation (e.g.[1,2,3,4]).Inferring host candidates is also a fruitful strategy to identify, reject and classify the transients in observations.Such approaches (along with lightcurve classification) are very useful for large sky survey telescopes producing a huge number of transient candidates every night (e.g.[5,6]).But, without spectroscopic observation to provide a clear indication, the association of extra-galactic transient sources with their host galaxy is often a complex task, especially Nr.Code  when their localisation is greater than ∼ 1 arcsec (e.g.[2]).In order to address this issue, people rely on probability of chance alignment computation, introduced by [1], to statistically provide host candidate (e.g.[2,3]).
The Galclaim software addresses these concerns providing an open source Python software dedicated to identify association between astrophysical transient sources and host galaxy by estimating the chance alignment between a given transient sky localisation and galaxies identified in astronomical surveys.The code lives in a dedicated git repertory2 .

Software description
The Galclaim software is distributed as an open source Python software relying on open source and standard tools for data science, numerics, astrophysics and plotting such as numpy, healpy, astropy, astroquery and matplotlib [7,8,9,10,11].
The Galclaim software is dedicated to identify association between astrophysical transient sources and host galaxy.This association is made by estimating the chance alignment between a given transient sky localisation and nearby galaxies using the formalism firstly introduced by [1] and widely used since in the transient sky community (e.g.[2,3,4]).In this formalism, the probability of chance alignment for a given transient and a given galaxy i is expressed as: Where r i is the angular distance between the transient localisation and the galaxy center, σ(≤ m i ) is the number of galaxies per arcsecond square having a magnitude below m i (magnitude of the galaxy i).This approach is typically used for transients localised with an error of up to few arcseconds (see Section 5.6.2 of [12]).The Galclaim software is relying on sky survey catalogs to crossmatch the transient localisation with known astrophysical sources.The current version of the code uses the Pan-STARRS catalog [13], the Hubble Source Catalog (HSC) [14], the AllWISE catalog [15] and the GLADE catalog [16].The Pan-STARRS and HSC catalogs provide a good resolution and a relatively deep photometry, two essential properties for the chance alignment estimation, but none of them is all-sky.The Pan-STARRS catalog is limited to −30 deg in declination and the HSC one is a visit-based discontinuous catalog.For the sake of completeness of usable sky position, we furthermore implemented the use of the AllWISE infrared catalog [15] which has the advantage of being all-sky but with worse resolution and depth.Finally, we implemented the GLADE catalog, which is dedicated for multimessenger searches, as it provide the redshift for nearby galaxies (with a reasonable completeness up to ∼ 91Mpc [16]).
In the absence of redshift, the first step is often to identify large galaxies near the transient localisation.For this reason, before any computation, Galclaim pre-check for nearby bright galaxies using the RC3 catalog [18] and a 30 arcseconds radius.When a nearby galaxy is found, a warning is raised to the user and the properties of the galaxy are saved in a dedicated output file.
As we are interested on galaxies only, we should identify galaxies in the used catalogs.In the HSC we simply used the extended flag provided.In the Pan-STARRS catalog, we discard stars from galaxies applying the color criteria (i mag,P SF − i mag,Kron ) ⩾ 0.05 up to a magnitude of 21 as proposed by [13].We flag as unknown classification object with magnitude higher than 21 or objects without i mag,Kron .For the AllWISE catalog, we used the color criteria W 1 mpro − J 2M ASS < −1.7 (in the range of 12 < W 1 mpro < 15) proposed by [17] to select galaxies.We flag as unknown objects for which this criteria is not computable.For both Pan-STARRS and AllWISE catalogs, if the photometry is not sufficient or available for a given object to apply these color criteria, we decide to keep such unknown objects (i.e.treat them as galaxies) in the following computation as it will lead to a slightly over estimation of the galaxy density, i.e. penalise the alignment chance probability and hence harden any significant association.In the case of an association with such unknown object, we leave it up to the user to further investigate if the object is indeed a galaxy.
To compute the σ parameter the Galclaim software follow the principle proposed by [3] which is based on a local estimation of the galaxy density in a given catalog.This allows to take into account the galaxy clustering, as opposed to σ estimation based on a whole catalog or using deep optical galaxy surveys as in [1,2].In practice, as illustrated by Figure 1, we first fetch all galaxies within a 30 arcseconds radius centered on the transient localisation center and consider all of them as host galaxy candidates.We then estimate the galaxy density fetching all galaxies within a 3 arcminutes radius from the transient position center where we remove the host galaxy candidates not to bias the estimation of the local galaxy density with the host itself.For host galaxy candidate i and for each photometric band in the catalog we can then compute the σ(≤ m i ) counting the number of galaxies, within the shell from 30 arcseconds to 3 arcminutes, with magnitude ≤ m i .We note that, for the HSC, we follow the same procedure but as the allsky coverage is sparse and discontinuous we modify the way galaxies used to compute the galaxy density are retrieved.We identify in each band of the catalog in which image lies the GRB localisation and use all the galaxies within this image.Knowing the field of view of the catalog images (202 × 202 arcseconds square) we have a cross-matched area which is comparable with the 3 arcminutes radius area used for Pan-STARRS, AllWISE and GLADE.
With σ(≤ m i ) we can compute the P i as described in equation 1 for each host galaxy candidates, each catalogs and each photometric band.One can then discuss the association with a given host galaxy candidate looking at the minimum of the computed P i for this galaxy as the result of the probability of chance alignment.While using the GLADE catalog, one can enable the check for compatible redshift providing a redshift range.Galclaim then compute for each host galaxy candidates if the redshift provided in the GLADE catalog is compatible with the provided redshift range and provides this information in the outputted host galaxy candidate properties.The blue circle illustrate the 30 arcseconds radius used to compute the galaxy density.The red circle illustrate the 30 arcseconds radius used to consider a galaxy as a host galaxy candidate.The probability of chance alignment is computed for all galaxies in this red circle.The blue and red circle are centered on the transient localisation.The green arrow illustrates the computation of the angular distance between the transient localisation and a given galaxy center.

Chance alignment threshold
In order to claim for an association, one needs to define a threshold in P i below which an association can be considered as reliable.While the typical value of the threshold is considered to be around P i = 0.01, this threshold is not fixed in Galclaim as different value can be considered depending the transients studied.As the computation of P i include the angular distance between the transient and the galaxy r i , different astrophysical sources will lead to different typical value of P i .

Software Architecture
The Galclaim software has a very simple structure separated in several python scripts, parsed so that the user can launch the code using a command line in a console.The main dependencies and installation instructions, as well as the usage instructions are described in a usual README file.We provide an example of transient source file format that the user needs to use.After running, the code saves the outputs in a dedicated directory.One output table (in ECSV Format) is provided by transient and by catalog with the list of host galaxy candidates identified and its computed probability of chance alignment.Along with the tables, if enabled by the user, another directory is created to save the plots (one by transients and catalogs) created to facilitate investigations of the associations.

Illustrative Examples
Studying the population of galaxies that produces GRBs and the locations of the GRBs inside their hosts, helps to identify and characterise the GRB progenitor and their environment (e.g.[1,2,3,4]).But the association between a given GRB and its host galaxy is a very complex issue especially when there is no optical or radio afterglow detection providing a subarcsecond localisation.For instance, a large offset between the host galaxy and transient location can originate from the 'kick' velocity imparted to the compact object at the time of birth producing short GRBs (e.g.[19,20]).In this context, the Galclaim software is used to identify, confirm or reject host galaxy candidates allowing to investigate and constrain their properties such as their redshift, stellar population and star formation rate [12].

Impact
The Galclaim software has already been used to study galaxy association in the case of short GRB transients and has proved its efficiency in identifying host galaxies for such events [12].The Galclaim software is also used by the GRANDMA collaboration to validate or invalidate transient candidates in several follow-up campaign [21,22].Such tools are very useful to characterise archived transient candidates in large sky survey telescopes such as ZTF [5] or LSST [6].Given its relatively high processing speed, Galclaim is a suitable tool for real-time classification of LSST transients, which require a drastic filter considering the large numbers involved (see for instance [23]).The current Galclaim version has typical computation time, enabling all the catalogs, of less that 30s for a given transient position.This computation time is by far dominated by the time needed to make the request in catalogs servers, hence is mainly independent of Galclaim code optimization.This typical computation time allows us to treat about 3000 transient candidates per days.While this is several orders of magnitude bellow the total number of transient candidate that LSST will provide per days, this is compatible with the rate of transient candidate one can get after applying filters dedicated to the search of a given transient source (afterglow, supernova, kilonova...).

Conclusion
The Galclaim software, dedicated to identify association between astrophysical transient sources and host galaxy, is a very useful tool to identify, confirm or reject transient candidates and host galaxy candidates.It's distributed as an open source Python software.It's has already been used by the GRANDMA collaboration and is expected to be widely used in the future for the classification of LSST transients.

Figure 1 :
Figure 1: Illustration of the probability of chance alignment computed with Equation 1.The blue circle illustrate the 30 arcseconds radius used to compute the galaxy density.The red circle illustrate the 30 arcseconds radius used to consider a galaxy as a host galaxy candidate.The probability of chance alignment is computed for all galaxies in this red circle.The blue and red circle are centered on the transient localisation.The green arrow illustrates the computation of the angular distance between the transient localisation and a given galaxy center.

Table 1 :
Code metadata