PycWB: A User-friendly, Modular, and Python-based Framework for Gravitational Wave Unmodelled Search

Unmodelled searches and reconstruction is a critical aspect of gravitational wave data analysis, requiring sophisticated software tools for robust data analysis. This paper introduces PycWB, a user-friendly and modular Python-based framework developed to enhance such analyses based on the widely used unmodelled search and reconstruction algorithm Coherent Wave Burst (cWB). The main features include a transition from C++ scripts to YAML format for user-defined parameters, improved modularity, and a shift from complex class-encapsulated algorithms to compartmentalized modules. The pycWB architecture facilitates efficient dependency management, better error-checking, and the use of parallel computation for performance enhancement. Moreover, the use of Python harnesses its rich library of packages, facilitating post-production analysis and visualization. The PycWB framework is designed to improve the user experience and accelerate the development of unmodelled gravitational wave analysis.


Motivation and Significance
The choice of programming language significantly influences the design and usage of scientific software.The benefits of having a Python-based software or Python interface for critical software in gravitational waves (GW) data analysis are outlined in [1,2].Python, as of now, is on its way to becoming the default programming language in GW data analysis.This statement can be corroborated by the emergence of Python-based gravitational waveform models like pySEOBNR [2], gwsurrogate [3], inference software like BILBY [4], PyCBC-inference [5].And the success and wide usage of GW data analysis algorithms like PyCBC [6].
Despite these advancements, there remain several opportunities where Python-based software can accelerate the usage and development of GW data analysis algorithms.One specific example is the so-called unmodelled search and waveform reconstruction algorithms in GW data analysis.The lack of readily available Python-based open-source software restricts the development and usage of un-modelled algorithms, limiting it primarily to researchers proficient in languages like C/C++ which is a low-level programming language.Creating Python-based solutions and interfaces will enhance participation and development in the field.
The Coherent Wave Burst (cWB) algorithm has been at the forefront of advancements in GW astrophysics [7].The range of applicability of cWB for GW transient data analysis is very wide as it is an all-sky morphology-independent algorithm i.e. it does not rely on the waveform models or the sky direction of the source.Instead cWB relies on the coherent energy produced by the GW signal in the network of detectors.cWB has played a major role in the discovery of the first detection of GW signal GW150914 [8] and more recently it has proved itself to be a crucial method to detect interesting transient GW signals that are not well modelled like GW190521 [9,10].cWB is routinely used in a variety of GW transient searches for the LIGO-Virgo-KAGRA collaborations like IMBH searches [11], eBBH searches [12] and generic searches for transients with short [13] and long duration [14].
While cWB offers an extensive array of functionalities and scripts, however, it falls short in facilitating user-specific modifications not inherently supported by the framework.Although cWB does provide plugin support, these plugins are required to access and manipulate global variables at the specific point of invocation.This approach demands a comprehensive understanding of the underlying code and risks unintentional disruptions or alterations to the variables.Moreover, the lack of clear dependencies between the modules further complicates the task for developers aiming to make modifications as the understanding of the interaction between different components becomes challenging.The PycWB framework addresses these issues and offers a more straightforward and stable environment for customization and code alteration.
This paper introduces PycWB, a modularized Python package for the cWB algorithm.This package will enable the easier integration of the future machine learning algorithm and new Python-based waveform models.The remainder of this paper is structured as follows: Section 2 provides an introduction to the structure and features of PycWB.Then, in Section 3, we present several use cases that demonstrate the user-friendliness and efficiency of PycWB, comparing its application with the traditional cWB.Finally, we share our conclusions and insights on the impact and potential of our new framework in the concluding section.

Software Description
The software framework in focus is implemented in Python and leverages the coherent Wave Burst (cWB) software originally developed on ROOT [15].The description of the core cWB algorithm and the code can be found here [16,17].The native pyROOT interface of ROOT has immensely facilitated this Python implementation, saving the need for rewriting the entire suite of algorithms used for cWB.Instead, the core cWB code is integrated, specifically the WAT module, which is included in the package and automatically compiled upon installation using pip.The installation process is streamlined to avoid the usually intricate cWB setup.
To install the PycWB, we provide the easiest way which is to use conda due to its dependencies on ROOT and HEALPix [18] for cWB core code 1 conda create -n pycwb "python>=3.9,<3.11"In its design, the software takes a modular approach.This way, the core cWB code is divided into different modules, providing a roadmap for future transitions, where the existing C++ codes can be replaced seamlessly with Python modules.

Modular and Classes
The original cWB framework presented a challenging structure where class-based constructions excessively encapsulated numerous methods and key algorithms.This approach led to dense coding, which was difficult to comprehend and modify due to its high interdependencies and complexity.However, in the PycWB framework, a fundamental shift towards modularity and clarity is adopted.Essential functions are transferred to Python classes, serving as standard data formats or interfaces between different modules.Meanwhile, key algorithms have been detached from their original class environments and restructured into independent modules.This revamped architecture facilitates efficient dependency management.The necessary variables are initialized before each function call, and functions are called as pure procedures, thereby significantly enhancing the software's comprehensibility, usability, and adaptability.On the right, we demonstrate our approach to modularization in PycWB.We have isolated each module to ensure it only depends on the input and subsequently delivers the output to the next steps.This architecture enhances the transparency and flexibility of the process, allowing for easier comprehension and customization.

Parallelization
The modularization in Python facilitates easy parallelization of various processes in PycWB, using Python's multiprocessing library.As a result, computations across layers are expedited, bringing about a speedup of 4 to 6 times in the pixel finding and clustering stage.Reading data and data conditioning also enjoy speed improvements, leading to an overall speedup of 2 to 3 times.
A further enhancement of performance by PycWB can be envisaged by the integration of GPU acceleration.With the increase in the number of GW events and data the integration of GPU acceleration will become essential for cWB algorithm.To this end PycWB interface provides a much more straightforward solution with Numba [19].

Post production processing
With Python's wealth of packages for data processing, visualization, and machine learning, post-production data processing is seamless.The modular framework lets users easily select and implement the post-processing modules they need without requiring code modifications or recompilation.For instance, integrating the autoencoder neural network for glitch detection [20] into PycWB is as simple as interfacing a few lines of TensorFlow [21] code with the framework.This ensures that advanced techniques such as machine learning can be employed efficiently and effectively without the need for extensive code modifications.

Web interface
Similar to cWB, PycWB also provides a web interface.But the web interface in PycWB is structured as a separate module.This module contains HTML, CSS and Javascript frameworks as a web app, along with simple Python functions to copy the static webpage files to the designated output directory.As a result, there's no need for HTML webpage generation.This separation of the web interface and the Python code contributes to the modularity and usability of PycWB.

Real data analysis
To validate the performance of PycWB, we conducted identical analyses on real LIGO-Virgo events GW150914 [8], GW170809 [22], and GW190521 [9] using both cWB and PycWB.For the cWB analysis, we used the cwb gwosc command to download data from GWOSC [23,24] and process them.

cwb_gwosc GW=EVENT_ID IFO=V1 TAG=TSTXY all
For PycWB, we utilized the same user configuration file and data, executing the analysis via a simple command.
pycwb_search user_parameters.yamlTable 3 outlines the events analyzed and the speed factors for both cWB and PycWB.The speed factor, calculated as the ratio of computation time to the length of the data, indicates a 2-3 times overall performance boost with PycWB.This notable increase in speed is primarily attributed to effective parallelization made very simple to implement due to the Python interface.
In addition, we highlight key parameters of the recovered events to showcase the consistency in accuracy between cWB and PycWB.Given that both platforms use the same algorithms and setups, similar accuracy levels are to be expected.Any   The results obtained from the PycWB analysis are consistent with those obtained from cWB, accommodating only the differences inherent in the data types between Python and C++. Figure 4 presents an example of an analysis conducted on GW150914 using PycWB.The left panel displays the time-frequency map of the likelihood of selected pixels, providing a visual representation of how the event's signal changes over time and frequency.The right panel illustrates the reconstructed waveform of the event.

Batch Injection with Python script
The PycWB framework simplifies the handling of batch injections involving large parameter sets.In cWB, injecting complex GW signals (like binary black holes populations) is achieved with cumbersome scripts which create XML files, in particular the LIGO Light-Weight (LIGOLW) XML format [25] for parameters.Moreover, one needs to modify the XML table manually when dealing with keys not predefined in the LIGOLW XML table.On the other hand, PycWB provides the option to generate an array of parameters directly through a Python function.This function returns a list of parameters, offering significant flexibility and efficiency.Recently search algorithms like PyCBC etc provide HDF5 injection file support to mitigate the issues with LIGO-LW XML file formats, these injection files can be seemlessly integrated in PycWB to have consistent injections between pipelines.These plots were created using Matplotlib, a popular data visualization library in Python, which takes the output from the likelihood module.In comparison with C++, Python's Matplotlib requires significantly less code for plotting, demonstrating the efficiency and ease of use provided by PycWB in visualizing analysis results.
The parameters are passed directly to the waveform generator, thus allowing users to employ their own waveform generators with additional parameters, circumventing the need for any code modification within PycWB.The Python function can be constructed as demonstrated in the code snippet 3. To implement this, the parameters from python keyword can be used in place of the parameters keyword in the YAML file containing user parameters.The example shows in code snippet 4.
Then, to incorporate this into a run, the user simply adds a generator key to the user parameter file, specifying the location and function name, as shown in code snippet 6.

Conclusions
The Coherent WaveBurst (cWB) is playing a key role in the discovery and analysis of gravitational waves.Despite its importance, the accessibility and user-friendliness of its interface have been a challenge, primarily due to the complex and highly technical nature of the C++ language it was originally written in.
In addressing this challenge, we present PycWB, a Python-based modular adaptation of the cWB.This transformation not only makes cWB more accessible to the scientific community but also unlocks the potential for numerous innovations in the field.By leveraging the power and simplicity of Python, PycWB makes complex analyses in gravitational wave research more manageable and user-friendly.The capability to use PycWB within interactive Jupyter Notebooks simplifies the learning process for new users, making it significantly more approachable.Moreover, the seamless compatibility with PyCBC and other Python-based waveform models such as pyseobnr and gwsurrogate greatly simplifies injection studies.Data post-processing is much easier.Its Python-based data output from each module makes data manipulation more intuitive.Additionally, users have the flexibility to choose the specific data they wish to save at each stage of the process.
Furthermore, the PycWB framework allows seamless integration with machine learning libraries in Python, paving the way for more sophisticated and automated analyses.This adaptability extends to developers, who can easily design new modules or add GPU-accelerated capabilities with Python packages like numba, without an in-depth understanding of the entire PycWB structure.
In conclusion, PycWB provides user-friendly, flexible, and powerful architecture, which opens new avenues for research and discovery in the realm of gravitational waves.The ease of use and adaptability of PycWB empowers both experienced researchers and newcomers alike to contribute to this exciting field of research.

Conflict of Interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Figure 1 :
Figure 1: This figure shows the structure of the PycWB.The blue blocks represent the Python modules and the orange blocks represent the Python class.The green blocks show the external C/C++ code embedded in PycWB, while the yellow blocks highlight the key variables.

Figure 2 :
Figure2: This figure shows the modular design of PycWB.On the left, the workflow appears streamlined, but the updates on shared variables are scattered within the modules making it resemble a black box.On the right, we demonstrate our approach to modularization in PycWB.We have isolated each module to ensure it only depends on the input and subsequently delivers the output to the next steps.This architecture enhances the transparency and flexibility of the process, allowing for easier comprehension and customization.

Figure 3 :
Figure 3: A selection of python packages that can be seamlessly integrated for post-production in PycWB

Figure 4 :
Figure4: A selection of the plots from the PycWB analysis on GW150914, the first detected GW event.These plots were created using Matplotlib, a popular data visualization library in Python, which takes the output from the likelihood module.In comparison with C++, Python's Matplotlib requires significantly less code for plotting, demonstrating the efficiency and ease of use provided by PycWB in visualizing analysis results.

Listing 3 : 1 def
generate parameters.py:A simple example for batch injection script

Table 2 :
Performance comparison: PycWB shows 2-3 times of performance improvement due to the parallelization compared to cWB on the selected events.couldpotentially be attributed to differences in data types between Python and C++.These results affirm PycWB's capabilities in maintaining the robustness of the algorithm while enhancing performance.