LLT: An R package for Linear Law-based Feature Space Transformation

The goal of the linear law-based feature space transformation (LLT) algorithm is to assist with the classification of univariate and multivariate time series. The presented R package, called LLT, implements this algorithm in a flexible yet user-friendly way. This package first splits the instances into training and test sets. It then utilizes time-delay embedding and spectral decomposition techniques to identify the governing patterns (called linear laws) of each input sequence (initial feature) within the training set. Finally, it applies the linear laws of the training set to transform the initial features of the test set. These steps are performed by three separate functions called trainTest, trainLaw, and testTrans. Their application requires a predefined data structure; however, for fast calculation, they use only built-in functions. The LLT R package and a sample dataset with the appropriate data structure are publicly available on GitHub.

The recently published linear law-based feature space transformation (LLT) (Kurbucz et al., 2022a)  This paper presents an R package called LLT, which is the first implementation of the LLT algorithm.This package implements LLT in a flexible yet user-friendly way while using separate functions for each computational step, which facilitates the further development of the algorithm.
In addition, it does not rely on functions written by the community, which results in low computational demand.The LLT R package and a sample dataset with the appropriate data structure are publicly available on GitHub (Kurbucz, Pósfay & Jakovác, 2023).The metadata of the package is presented in Table 1.The rest of this paper is organized as follows.Section 2 presents the concept of linear laws and briefly introduces the LLT algorithm.Section 3 and 4 describe the structure and use of the software in detail.In Section 5, the application of the software is presented on an electric power consumption dataset.Finally, Section 6 discusses the impacts of the software and provides conclusions.

LLT algorithm
This section briefly overviews the definition of linear laws and how this concept can be applied to feature space transformation.Note that the LLT algorithm is described in detail by Kurbucz et al. (2022a), while derivations and proofs related to the linear laws can be found in Jakovác (2021).

Linear laws of time series
First, consider a generic time series z t where t ∈ {1, 2, ..., k} represents the time.The l th order (l ∈ Z + and l < k) time-delay embedding (Takens, 1981) of this series is defined by: Then, a symmetric l × l matrix S is generated from A as follows: The term law in our case implies that we are seeking those weights that transform the values of the S matrix so that they are close to zero; that is, we seek the coefficients (v) that satisfy the following equation: where 0 is a column vector containing l elements of null value, v is a column vector with l elements and v = 0. To find the v coefficients of Eq. ( 3), we first perform eigendecomposition on the S matrix.Then, we select the eigenvector that is related to the smallest eigenvalue.Finally, we apply this eigenvector as v coefficients, and hereinafter, we refer to it as the linear law of z t .Note that this logic is related to principal component analysis (PCA) (Pearson, 1901;Hotelling, 1933); however, in contrast to PCA, we look for components that minimize the variance of the projected data (see Jakovác, 2021;Jakovác, Kurbucz & Pósfay, 2022;Kurbucz et al., 2022a).
During the first step of the LLT algorithm, instances (i) are separated into training (tr ∈ {1, 2, . . ., τ }) and test (te ∈ {τ + 1, τ + 2, . . ., n}) sets in such a way that ensures a balanced representation of the instance classes across both sets.(For transparency, we assume that the arrangement of the instances within the dataset meets this condition for the tr and te sets.)We then identify the linear law (see v in Eq. ( 3)) of each input series of the training set (x 1,1 t , x 2,1 t , . . ., x τ,m t ), thus obtaining a total of τ × m laws (eigenvectors).These laws are grouped by input series and classes as follows: , where V j c refers to the laws of the training set associated with input series j and class c.
In the next step, S te,j matrices (see Eq. ( 2)) are calculated from the input series of the test instance, which results in m matrices per instance (one for each initial feature).We then leftmultiply the V j matrices obtained from the training set by the S te,j matrices of the test set related to the same initial feature The laws of the V j matrices provide an estimate of whether the S te,j matrices of the test set belong to the same class as them.
That is, only those columns of the S te,j V j matrices are in proximity to the null vector with relatively small variance, for which the classes of the corresponding training and testing data match.
Finally, the dimension of the resulting matrices is reduced by a function that selects the column vectors with the smallest variance and/or absolute mean from the S te,j V j matrices for each class.
After these calculation steps, the transformed feature space of the test set has ((n−τ )l)×((mc)+1) dimensions with the output variable.
The calculation steps are illustrated in Fig. 1.

Software description
The LLT R package is the first to implement the LLT algorithm.This package contains three main functions (trainTest, trainLaw, and testTrans) and two auxiliary functions (embed and linlaw).The auxiliary functions are called by the main functions, so the user does not need to use them to perform the LLT algorithm.
Description of the main functions: • trainTest(path,test ratio,seed) (trainTest.R): This function generates a two-level list that splits the instances into training and test sets.The first level separates the training and test sets, and the second level groups the instances by class (see Fig. A1).It has two mandatory arguments and one optional user-defined argument as follows: path (character ): The path to the directory that contains the instances grouped by class.
test ratio (double ∈ [0, 1]): The ratio of instances in the training and test sets.
seed (integer ): The initial value of the random number seed.By default, it is not fixed.path (character ): The path to the directory that contains the instances grouped by class.
train test (list): A two-level list that splits the instances into training and test sets.
It can be generated by the trainTest function or defined by the user manually.Fig.
A1 presents an example of the appropriate structure of this object.
- path (character ): The path to the directory that contains the instances grouped by class.
train test (list): A two-level list that splits the instances into training and test sets.
It can be generated by the trainTest function or defined by the user manually.Fig.
A1 presents an example of the appropriate structure of this object.
train law (data.frame):The set of laws generated from the training instances.It can be generated by the trainLaw function.(For development purposes, e.g., for the creation of a learning algorithm, the user can easily modify this data.frame.) lag (integer ∈ [1, l]): It defines the successive row lag of the A matrix.By default, it is 1 (see Eq. ( 1)).(The value l is the order of the time-delayed embedding.) select (character ∈ {"rank", "var", "mean"}): New features are defined based on this (f ) function (see Feature space transformation section).The "var" option selects a column vector per class and input series with the smallest variance, while the "mean" option performs this selection based on the minimum absolute mean value.The "rank" minimizes both at the same time by ranking the columns by variance and absolute mean and selecting the column with the smallest sum of ranks.All three selection criteria result in as many new features as the number of classes multiplied by the number of input series.The default value is "rank".
Description of the auxiliary functions: • embed(series,dim,lag) (embed.R): This function generates the S matrix from a time series (see Eq. ( 2)).It has two mandatory arguments and one optional user-defined argument as follows: series (numeric): A time series in a column vector without missing values.
dim (integer ∈ [2, k]): It defines the row and column dimension (l) of the symmetric matrix S. (The value k is the length of the input series.) lag (integer ∈ [1, l]): It defines the successive row lag of the A matrix.By default, it is 1 (see Eq. ( 1)).(The value l is the order of the time-delayed embedding.) • linlaw(series,dim,lag) (linlaw.R): By applying the embed function, it generates the law (v) of a time series (see Eq. ( 3)).It has two mandatory arguments and one optional userdefined argument as follows: series (numeric): A time series in a column vector without missing values.
dim (integer ∈ [2, k]): It defines the row and column dimension (l) of the symmetric matrix S. (The value k is the length of the input series.) lag (integer ∈ [1, l]): It defines the successive row lag of the A matrix.By default, it is 1 (see Eq. ( 1)).(The value l is the order of the time-delayed embedding.) The LLT R package and a sample dataset with the appropriate data structure are publicly available on GitHub (Kurbucz et al., 2023).

Installation
The LLT can be installed by using the devtools R package as follows.
1 # i n s t a l l .p a c k a g e s ( " d e v t o o l s " ) 2 # l i b r a r y ( d e v t o o l s ) 3 d e v t o o l s : : i n s t a l l g i t h u b ( " mtkurbucz /LLT" )

Data preparation
After installation, the dataset to be transformed must be converted into a data structure in which instances are grouped by classes.Furthermore, time series features must be tab-separated column vectors with the name of the feature in the header.The appropriate data structure is presented in Fig. 2.

Data transformation
A dataset with the appropriate structure can be transformed in the following way using the LLT package.
1 # Loading package 2 l i b r a r y (LLT) 3 4 # S e t t i n g p a r a m e t e r s 5 path <− " ./ data " 6 t e s t r a t i o <− 0.30 8 s e e d <− 1 2 3 4 5 9 l a g <− 9 10 s e l e c t <− " var " 11 12 # C a l c u l a t i o n 13 t r a i n t e s t <− L L T : : t r a i n T e s t ( path , seed , t e s t r a t i o ) 14 t r a i n law <− LLT::trainLaw ( path , t r a i n t e s t , dim , l a g ) 15 r e s u l t <− L L T : : t e s t T r a n s ( path , t r a i n t e s t , t r a i n law , l a g , s e l e c t )

Illustrative examples
This section presents a simple example of using the LLT package.In this example, we employ   Note that in the case of more difficult classification tasks, it may be worthwhile to compute additional statistics (such as variance) from the new features and then apply a classification algorithm on the obtained feature space.Based on our preliminary results (see, Kurbucz et al., 2022a), we achieve the most accurate result with the least computational demand by combining the LLT and the k-nearest neighbor (KNN) (Fix, 1985;Cover & Hart, 1967) algorithms.
An additional application example is provided by Kurbucz et al. (2022a).In this paper, the efficiency of LLT combined with various classifiers is examined on a real-world human activity recognition (HAR) dataset called the Activity Recognition system based on Multisensor data fusion (AReM) (Palumbo, Gallicchio, Pucci & Micheli, 2016).According to the results, LLT vastly increased the accuracy of traditional classifiers, which outperformed state-of-the-art methods after the proposed feature space transformation.

Impact and conclusion
The A rudimentary version of the LLT R package has been utilized in Jakovác et al. (2022);Kurbucz et al. (2022a), andKurbucz, Pósfay &Jakovác (2022b).Both the package and a sample dataset with the appropriate data structure are publicly available on GitHub (Kurbucz et al., 2023).
In conclusion, the value of the LLT R package can be summarized as follows: • The LLT package implements the linear law-based feature space transformation (LLT) algorithm in the R programming language.
• The calculation steps are performed by separate functions, which facilitate the further development of the algorithm.
• Despite the flexibility of the package, its functions have been designed in a user-friendly way and require only the most important parameters.
• To maintain low computational requirements, the LLT package only uses built-in functions.

Data availability
The aims to facilitate univariate and multivariate time series classification tasks by transforming the structure of the feature set (or the original time series) to make the data easier to classify.As a first step, this algorithm splits the instances into training and test sets.Then, it applies timedelay embedding and spectral decomposition techniques to identify the governing patterns (called linear laws) of each input sequence (initial feature) within the training set.Finally, it utilizes the linear laws of the training set to transform the initial features of the test set.This transformation procedure has low computational complexity and provides the opportunity to develop a learning algorithm.

Figure 1 :
Figure 1: Steps of the LLT algorithm

•
trainLaw(path,train test,dim,lag) (trainLaw.R):This function creates a data.framecontaining the set of laws generated from the instances of the training set.It has three mandatory and two optional user-defined arguments as follows: dim (integer ∈ [2, k]): It defines the row and column dimension (l) of the symmetric matrix S. (The value k is the length of the input series.)lag (integer ∈ [1, l]): It defines the successive row lag of the A matrix.By default, it is 1 (see Eq. (1)).(The value l is the order of the time-delayed embedding.)• testTrans(path,train test,train law,lag,select) (testTrans.R): This function transforms the instances of the test set by using the LLT algorithm.It generates a data.frameobject in which columns are new features and rows are the dim-length time series created from the test instances and placed one below the other.It has three mandatory and two optional user-defined arguments as follows:

Figure 2 :
Figure 2: Appropriate data structure for 2 classes and 6 features

the
PowerCons dataset collected by the Research and Development branch of Electricité de France (EDF) in Clamart (France), which is publicly available in the UCR Time Series Classification Archive (Dau, Keogh, Kamgar, Yeh, Zhu, Gharghabi, Ratanamahatana, Yanping, Hu, Begum, Bagnall, Mueen, Batista & Hexagon-ML, 2018).It contains the individual household electric power consumption over the course of one year, categorized into two seasonal classes: "Warm" and "Cold", based on whether the power consumption was recorded during the warm seasons (from April to September) or the seasons (from October to March).Each instance in the dataset represents a day, with electric power consumption recorded at a sampling rate of ten minutes.Instances are associated with a class and comprise 144 consecutive values.Fig. 3 displays examples of daily power consumption from each class.

Figure 3 :
Figure 3: Examples of the time series belonging to each class

Figure 4 :
Figure 4: Histogram of accuracies goal of the linear law-based feature space transformation (LLT) algorithm is to assist with the classification of univariate and multivariate time series.The presented R package, called LLT, implements this algorithm in a flexible yet user-friendly way.This package first splits the instances into training and test sets.It then utilizes time-delay embedding and spectral decomposition techniques to identify the governing patterns (called linear laws) of each input sequence (initial feature) within the training set.Finally, it applies the linear laws of the training set to transform the initial features of the test set.These steps are performed by three separate functions called trainTest, trainLaw, and testTrans.Their application requires a predefined data structure; however, for fast calculation, they use only built-in functions.
Figure A1: Example of the structure of train test with 3 classes

Table 1 :
Metadata of the LLT package