F2D-SIFPNet: a frequency 2D Slow-I-Fast-P network for faster compressed video action recognitionMing, Yue; Zhou, Jiangwan; Jia, Xia; Zheng, Qingfang; Xiong, Lu; Feng, Fan; Hu, Nannan
doi: 10.1007/s10489-024-05408-ypmid: N/A
Recent video action recognition methods directly use RGB pixels in the compressed domain. The cumbersome decoding process of traditional methods is avoided, enabling efficient recognition. However, these methods require converting the discrete cosine transform (DCT) frequency to an extended RGB pixel representation with heavy time consuming. To alleviate this drawback, a novel frequency 2D Slow-I-Fast-P network (F2D-SIFPNet) is proposed that significantly enhances the speed of action recognition. Initially, a new Frequency-Domain Partial Decompression (FPDec) method was designed for extracting the frequency domain DCT coefficients directly from the compressed video, eliminating the last time-consuming decoding process in FFmpeg. Subsequently, the Frequency-Domain Channel Selection (FCS) strategy was introduced for down-sampling the frequency-domain data, thereby augmenting the saliency of the input. Additionally, the Frequency Slow-I-Fast-P path (FSIFP) and the Adaptive Motion Excitation (AME) module were presented to emphasize the significant frequency components. FSIFP efficiently models slow spatial features and fast temporal changes simultaneously, while the AME generates an adaptive convolution kernel that captures both long-term and short-term motion cues. Extensive experiments were conducted on four public datasets: Kinetics-700, Kinetics-400, UCF-101, and HMDB-51. The results showed superior accuracies of 55.6%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\%$$\end{document}, 74.0%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\%$$\end{document}, 96.3%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\%$$\end{document} and 74.6%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\%$$\end{document} respectively, with preprocessing times being 6.31 times faster.
CA-PDBPR: category-aware privacy preserving POI recommendation using decentralized Bayesian personalized rankingGao, Qinyun; Yu, Shenbao; Chen, Bilian; Cao, Langcai
doi: 10.1007/s10489-024-05426-wpmid: N/A
Point-of-interest (POI) recommendation has gained significant traction recently due to the rising trend of location-based networks. Traditional approaches rely on a centralized collection of user data. Concerning privacy protection, decentralized federated learning employs model training on each user’s device with nearby collaborative training techniques. However, existing decentralized federated recommendations suffer from two major problems: (1) Privacy risks: existing approaches expose geographical location or co-rated items information when constructing user neighborhoods. (2) Performance limitations: existing approaches adopt a simple model without incorporating auxiliary information. To solve these, we propose CA-PDBPR (category-aware privacy preserving POI recommendation using decentralized Bayesian personalized ranking) to address the above challenges. Specifically, we introduce a novel privacy-enhanced neighborhood creation method utilizing POI category preferences to calculate decentralized user similarity through secret sharing technology, ensuring a higher level of privacy. Moreover, we integrate POI category information with a refined Bayesian personalized ranking (BPR) loss function to enhance recommendation performance. Experimental evaluations conducted on real-world datasets validate the effectiveness of the CA-PDBPR model, demonstrating enhanced recommendation quality while minimizing data exposure compared with state-of-the-art alternatives.
Lifelong learning gets better with MixUp and unsupervised continual representationkumar, Prashant; Toshniwal, Durga
doi: 10.1007/s10489-024-05434-wpmid: N/A
Continual learning enables learning systems to adapt to evolving data distributions by sequentially acquiring knowledge from a series of tasks. Unsupervised lifelong learning refers to the ability to learn over time while memorizing previous patterns without supervision. However, the prior methods in this field heavily rely on supervised or reinforcement learning, which necessitates annotated data, thereby limiting their scalability in real-world applications where data is often biased and lacks annotations. To overcome these challenges, this work introduces a novel approach called LifelongLearning gets better with MixUp and UnsupervisedContinualRepresentation (LL-UCR). LL-UCR aims to learn feature representations from unlabeled tasks, eliminating the need for annotated data. Within the LL-UCR framework, two innovative techniques are introduced: LL-MixUp, which mitigates catastrophic forgetting by interpolating samples between current and previous tasks, and Dark Experience Replay (DER) Buzzega et al. (Adv Neural Inf Process Syst, 33, 15920–15930 2020) adapted for UCR, aligning network logits across tasks. To overcome buffer size limitations in replay-based methods, the Retrospective Adversarial Replay (RAR) framework is incorporated, facilitating diverse replay sample generation. Through systematic analysis, we demonstrate that unsupervised visual representations exhibit remarkable resilience to catastrophic forgetting, consistently outperforming supervised methods in terms of performance and generalization on out-of-distribution tasks. Furthermore, our qualitative analysis reveals that LL-UCR fosters a smoother loss landscape and acquires meaningful feature representations. Extensive experimental evaluations conducted on diverse datasets validate the superior performance of LL-UCR compared to state-of-the-art supervised continual learning methods and the unsupervised LUMP Madaan et al. (International conference on learning representations, 2020) method, effectively mitigating catastrophic forgetting.
Beyond traditional steganography: enhancing security and performance with spread spectrum image steganographyKuznetsov, Oleksandr; Frontoni, Emanuele; Chernov, Kyrylo
doi: 10.1007/s10489-024-05415-zpmid: N/A
This study investigates the innovative application of Direct Sequence Spread Spectrum (DSSS) technology in the realm of image steganography, known as Spread Spectrum Image Steganography (SSIS). By interpreting the cover image as noise in the communication channel, SSIS capitalizes on the noise-resistant properties of broadband communication systems to effectively conceal information within images. We focus on the development of new classes of spreading sequences with desirable ensemble and correlation properties, which significantly impact the performance of SSIS. We propose a data hiding method that directly addresses spreading sequences, resulting in minimized cover image distortion and heightened resistance to message detection. Furthermore, we explore adaptive spreading sequences that consider the statistical properties of the cover image, substantially reducing error intensity in recovered messages and improving the overall steganographic system performance. Our experiments confirm the advantages of the proposed system and support the theoretical arguments. In addition, we employ artificial neural networks for steganalysis, generating several datasets with varying SSIS payloads and examining the detectability of embedded data using a specially designed convolutional neural network (CNN). While this model demonstrates high effectiveness on other datasets, the detection error probability for SSIS is considerably higher, indicating greater reliability and security even when advanced steganalysis techniques are employed. The findings highlight the potential of SSIS in developing robust and secure communication systems capable of functioning effectively in high-noise environments while preserving the integrity of the cover image.
Revisiting clustering for efficient unsupervised dialogue structure inductionRaedt, Maarten De; Godin, Fréderic; Develder, Chris; Demeester, Thomas
doi: 10.1007/s10489-024-05455-5pmid: N/A
In the development of a task-oriented dialogue system, defining the dialogue structure is a time-consuming task. Hence, several works have looked into automatically inferring it from data, e.g., actual conversations between a customer and a support agent. To recover such dialogue structure, recent methods based on discrete variational models learn to jointly encode and cluster utterances in dialogue states, but (i) represent utterances by only considering preceding dialogue context, and (ii) are slow to train since they are optimized with a compute-expensive decoding objective. We revisit and improve upon an existing efficient pipeline approach, commonly adopted as a baseline, that first encodes utterances and then clusters them with k-means to induce the dialogue structure. However, the existing approach represents utterances as bag-of-words or skip-thought vectors, which have been shown to perform poorly in semantic similarity tasks, and without considering dialogue context. We therefore first investigate the use of more powerful transformer-based encoders for encoding utterances. Next, we propose ellodar, a method for learning representations that capture both preceding and subsequent dialogue context, inspired by word-to-vec training strategies. ellodar is efficient since representations are learned directly in the encoding space by finetuning just a single linear layer on top of a frozen sentence encoder with a vector-to-vector regression training objective. Extensive experiments on representative datasets for dialogue structure induction (SimDial, Schema Guided Dialogues, DSTC2, and CamRest676) demonstrate that in terms of effectiveness to induce the correct dialogue structure, (i) clustering utterances represented by transformed-based encoders improves recent joint models by 13%–32% on standard cluster metrics, and (ii) clustering ellodar’s representations yields additional improvements ranging from +20% to +26%, with speedups of ×\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\times $$\end{document} 10\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{10}$$\end{document}–104\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{10}^{\textbf{4}}$$\end{document} compared to the recent joint models.
A novel deep learning approach for intelligent bearing fault diagnosis under extremely small samplesDing, Peixuan; Xu, Yi; Qin, Pan; Sun, Xi-Ming
doi: 10.1007/s10489-024-05429-7pmid: N/A
Rotor bearing health is crucial for ensuring the operational stability of rotating equipment. Deep learning-based fault diagnosis methods have achieved widespread success due to their superior fault identification capability. However, conventional deep learning methods that rely on large quantities of data are not feasible for most important mechanical equipment since obtaining fault data is difficult. To address this problem, we propose channel attention siamese networks (CASN) with metric learning for intelligent bearing fault diagnosis with extremely small samples. First, in the feature learning phase, pairs of sample inputs are constructed, and feature extraction is performed by a shared encoder. Then, in the disparity learning phase, the differences between features of sample pairs are mapped as metric distances. Based on the metric distance between the unlabeled and labeled data, the fault type of the unlabeled data can be predicted in the test phase. The experimental results show that CASN achieves over 97% accuracy when the sample size is extremely small. In addition, even under the conditions of noise interference and signal transmission distortion, our model still has reliable diagnostic ability.
Outlier detection for incomplete real-valued data via information entropy and class-consistent technologyCai, Xiaopeng; Li, Zhaowen
doi: 10.1007/s10489-024-05428-8pmid: N/A
Outlier detection aims to find data points that are significantly different from other observed values. It has been widely used in fraud detection, network security, and medical fields. Most of the existing outlier detection methods do not fully consider the problem of missing values in data sets. This paper studies outlier detection for incomplete real-valued data via information entropy and rough set theory (RST). First, a tolerance relation based on class-consistent technology is introduced to describe the similarity between information values in an incomplete real-valued information system (IRVIS). Then, the tolerance classes are formed according to the tolerance relation, and are used to calculate information entropy and other metrics. Next, an outlier factor is defined for each object in an IRVIS to describe its uncertainty and degree of outlier. Finally, an outlier detection method for an IRVIS is proposed, and the corresponding algorithm (CIEOD) is designed. The proposed method is compared with five other detection methods by numerical experiments based on UCI data. The experimental results show that the CIEOD algorithm is more efficient. It is worth mentioning that in order to make comprehensive comparison, Precision, Recall, F1-measure and ROC curve are used to describe the strengths of the proposed method.
Non-local self-attention network for image super-resolutionZeng, Kun; Lin, Hanjiang; Yan, Zhiqiang; Fang, Jinsheng; Lai, Taotao
doi: 10.1007/s10489-024-05343-ypmid: N/A
The utilization of self-attention mechanisms in Transformer-based methods has shown great potential in addressing the image super-resolution (SR) task by capturing long-range dependencies. However, many existing Transformer-based methods for SR extract features locally within a small window and rely on shifted window self-attention to gradually incorporate long-range dependencies. These methods may not effectively exploit non-local image information for SR. To overcome this limitation, we propose a novel non-local self-attention (NLSA) mechanism that directly models non-local dependencies. Firstly, NLSA utilizes locality-sensitive hashing to identify similar pixel-wise features with minimal computational cost. Next, a pixel-shuffling operation is applied to gather similar features within the same window. This pixel-shuffling technique effectively expands the receptive field beyond the window size. Furthermore, we introduce a simplified window self-attention (SiWSA) that operates within each window to capture intrinsic long-term dependencies among the shuffled features, regardless of the position information. Finally, after the SiWSA calculation, the features are shuffled back to their original positions to maintain data consistency. This overall NLSA mechanism enables the capture of non-local information without the need for excessively deep networks to enlarge the receptive field. Based on NLSA, we propose a non-local self-attention network (NLSAN) designed explicitly for the SR task. Through extensive experimental evaluations, we demonstrate the superior performance of NLSAN compared to several state-of-the-art SR methods in quantitative and qualitative assessments. The code of the proposed method is available at https://github.com/zengkun301/NLSAN.
Graph neural networks with selective attention and path reasoning for document-level relation extractionHang, Tingting; Feng, Jun; Wang, Yunfeng; Yan, Le
doi: 10.1007/s10489-024-05448-4pmid: N/A
Document-level Relation Extraction (DocRE) aims to extract relations from multiple sentences simultaneously. Existing graph-based methods adopt static graphs to represent the document structure, which is unable to capture complex interactions. Besides, they take all sentences in the document as the scope of relation extraction (RE) while introducing noise by irrelevant sentences. Furthermore, they do not explicitly model the reasoning chain, leading to a lack of explainability in the reasoning results. These limitations may significantly hinder their performance in practical applications. In this paper, we propose a model based on selective attention and path reasoning for DocRE. Firstly, we adopt hierarchical heterogeneous graph neural networks and recurrent neural networks to realize document modeling and capture complex interactions in the document. Secondly, we adopt selective attention to select sentences related to the entity pair to generate document subgraphs as the scope of RE. Lastly, we adopt path reasoning to explicitly model the reasoning chain between multiple entities in the document subgraph, infer the relations between entities and provide corresponding supporting evidence. Extensive experiment results on three benchmark datasets show that the proposed framework is effective and achieves superior performance compared to most methods. Further analysis demonstrates that selective attention and path reasoning can discover more accurate inter-sentence relations and supporting evidence.
Supporting ANFIS interpolation for image super resolution with fuzzy rough feature selectionIsmail, Muhammad; Shang, Changjing; Yang, Jing; Shen, Qiang
doi: 10.1007/s10489-024-05445-7pmid: N/A
Image Super-Resolution (ISR) is utilised to generate a high-resolution image from a low-resolution one. However, most current techniques for ISR confront three main constraints: i) the assumption that there is sufficient data available for training, ii) the presumption that areas of the images concerned do not involve missing data, and iii) the development of a computationally efficient model that does not compromise performance. In addressing these issues, this study proposes a novel lightweight approach termed Fuzzy Rough Feature Selection-based ANFIS Interpolation (FRFS-ANFISI) for ISR. Popular feature extraction algorithms are employed to extract the potentially significant features from images, and population-based search mechanisms are utilised to implement effective FRFS methods that assist in selecting the most important features among them. Subsequently, the processed data is entered into the ANFIS interpolation model to execute the ISR operation. To tackle the sparse data challenge, two adjacent ANFIS models are trained with sufficient data where appropriate, intending to position the ANFIS model of sparse data in the middle. This enables the two neighbouring ANFIS models to be interpolated to produce the otherwise missing knowledge or rules for the model in between, thereby estimating the corresponding outcomes. Conducted on standard ISR benchmark datasets while considering both sufficient and sparse data scenarios, the experimental studies demonstrate the efficacy of the proposed approach in helping deal with the aforementioned challenges facing ISR.