Read-once machines and the thermodynamic complexity of Maxwell's demonsKıyak, Fırat;Say, A. C. Cem
doi: 10.48550/arxiv.2304.11452pmid: N/A
Abstract: The thermodynamical costs imposed by computational resource limitations like memory and time have been investigated before. We focus on a new computational limitation, namely, the machine being allowed to scan the input only once, and prove that it is associated with unavoidable thermodynamical cost, even in the presence of infinite time and memory resources. We identify this limitation to be the one suffered by Maxwell's demons. This provides a framework for quantifying the complexity associated with an experiment that effectuates a "decrease" in the entropy of a thermodynamic system.
Fast Sampling of $b$-Matchings and $b$-Edge CoversChen, Zongchen;Gu, Yuzhou
doi: 10.48550/arxiv.2304.14289pmid: N/A
Abstract: For an integer $b \ge 1$, a $b$-matching (resp. $b$-edge cover) of a graph $G=(V,E)$ is a subset $S\subseteq E$ of edges such that every vertex is incident with at most (resp. at least) $b$ edges from $S$. We prove that for any $b \ge 1$ the simple Glauber dynamics for sampling (weighted) $b$-matchings and $b$-edge covers mixes in $O(n\log n)$ time on all $n$-vertex bounded-degree graphs. This significantly improves upon previous results which have worse running time and only work for $b$-matchings with $b \le 7$ and for $b$-edge covers with $b \le 2$. More generally, we prove spectral independence for a broad class of binary symmetric Holant problems with log-concave signatures, including $b$-matchings, $b$-edge covers, and antiferromagnetic $2$-spin edge models. We hence deduce optimal mixing time of the Glauber dynamics from spectral independence. The core of our proof is a recursive coupling inspired by (Chen and Zhang '23) which upper bounds the Wasserstein $W_1$ distance between distributions under different pinnings. Using a similar method, we also obtain the optimal $O(n\log n)$ mixing time of the Glauber dynamics for the hardcore model on $n$-vertex bounded-degree claw-free graphs, for any fugacity $\lambda$. This improves over previous works which have at least cubic dependence on $n$.
Propheter: Prophetic Teacher Guided Long-Tailed Distribution LearningXu, Wenxiang;Jing, Yongcheng;Zhou, Linyun;Huang, Wenqi;Cheng, Lechao;Feng, Zunlei;Song, Mingli
doi: 10.48550/arxiv.2304.04135pmid: N/A
Abstract:The problem of deep long-tailed learning, a prevalent challenge in the realm of generic visual recognition, persists in a multitude of real-world applications. To tackle the heavily-skewed dataset issue in long-tailed classification, prior efforts have sought to augment existing deep models with the elaborate class-balancing strategies, such as class rebalancing, data augmentation, and module improvement. Despite the encouraging performance, the limited class knowledge of the tailed classes in the training dataset still bottlenecks the performance of the existing deep models. In this paper, we propose an innovative long-tailed learning paradigm that breaks the bottleneck by guiding the learning of deep networks with external prior knowledge. This is specifically achieved by devising an elaborated ``prophetic'' teacher, termed as ``Propheter'', that aims to learn the potential class distributions. The target long-tailed prediction model is then optimized under the instruction of the well-trained ``Propheter'', such that the distributions of different classes are as distinguishable as possible from each other. Experiments on eight long-tailed benchmarks across three architectures demonstrate that the proposed prophetic paradigm acts as a promising solution to the challenge of limited class knowledge in long-tailed datasets. The developed code is publicly available at \url{this https URL}.
Implicit Neural Head Synthesis via Controllable Local Deformation FieldsChen, Chuhan;O'Toole, Matthew;Bharaj, Gaurav;Garrido, Pablo
doi: 10.48550/arxiv.2304.11113pmid: N/A
Abstract: High-quality reconstruction of controllable 3D head avatars from 2D videos is highly desirable for virtual human applications in movies, games, and telepresence. Neural implicit fields provide a powerful representation to model 3D head avatars with personalized shape, expressions, and facial parts, e.g., hair and mouth interior, that go beyond the linear 3D morphable model (3DMM). However, existing methods do not model faces with fine-scale facial features, or local control of facial parts that extrapolate asymmetric expressions from monocular videos. Further, most condition only on 3DMM parameters with poor(er) locality, and resolve local features with a global neural field. We build on part-based implicit shape models that decompose a global deformation field into local ones. Our novel formulation models multiple implicit deformation fields with local semantic rig-like control via 3DMM-based parameters, and representative facial landmarks. Further, we propose a local control loss and attention mask mechanism that promote sparsity of each learned deformation field. Our formulation renders sharper locally controllable nonlinear deformations than previous implicit monocular approaches, especially mouth interior, asymmetric expressions, and facial details.
The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series ForecastingHan, Lu;Ye, Han-Jia;Zhan, De-Chuan
doi: 10.48550/arxiv.2304.05206pmid: N/A
Abstract: Multivariate time series data comprises various channels of variables. The multivariate forecasting models need to capture the relationship between the channels to accurately predict future values. However, recently, there has been an emergence of methods that employ the Channel Independent (CI) strategy. These methods view multivariate time series data as separate univariate time series and disregard the correlation between channels. Surprisingly, our empirical results have shown that models trained with the CI strategy outperform those trained with the Channel Dependent (CD) strategy, usually by a significant margin. Nevertheless, the reasons behind this phenomenon have not yet been thoroughly explored in the literature. This paper provides comprehensive empirical and theoretical analyses of the characteristics of multivariate time series datasets and the CI/CD strategy. Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series. In contrast, the CI approach trades capacity for robust prediction. Practical measures inspired by these analyses are proposed to address the capacity and robustness dilemma, including a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy. We hope our findings can raise awareness among researchers about the characteristics of multivariate time series and inspire the construction of better forecasting models.
Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene UnderstandingYang, Yu-Qi;Guo, Yu-Xiao;Xiong, Jian-Yu;Liu, Yang;Pan, Hao;Wang, Peng-Shuai;Tong, Xin;Guo, Baining
doi: 10.48550/arxiv.2304.06906pmid: N/A
Abstract: The use of pretrained backbones with fine-tuning has been successful for 2D vision and natural language processing tasks, showing advantages over task-specific networks. In this work, we introduce a pretrained 3D backbone, called {\SST}, for 3D indoor scene understanding. We design a 3D Swin transformer as our backbone network, which enables efficient self-attention on sparse voxels with linear memory complexity, making the backbone scalable to large models and datasets. We also introduce a generalized contextual relative positional embedding scheme to capture various irregularities of point signals for improved network performance. We pretrained a large {\SST} model on a synthetic Structured3D dataset, which is an order of magnitude larger than the ScanNet dataset. Our model pretrained on the synthetic dataset not only generalizes well to downstream segmentation and detection on real 3D point datasets, but also outperforms state-of-the-art methods on downstream tasks with +2.3 mIoU and +2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation, +1.8 mIoU on ScanNet segmentation (val), +1.9 [email protected] on ScanNet detection, and +8.1 [email protected] on S3DIS detection. A series of extensive ablation studies further validate the scalability, generality, and superior performance enabled by our approach. The code and models are available at this https URL .
Sequential Recommendation with Diffusion ModelsDu, Hanwen;Yuan, Huanhuan;Huang, Zhen;Zhao, Pengpeng;Zhou, Xiaofang
doi: 10.48550/arxiv.2304.04541pmid: N/A
Abstract: Generative models, such as Variational Auto-Encoder (VAE) and Generative Adversarial Network (GAN), have been successfully applied in sequential recommendation. These methods require sampling from probability distributions and adopt auxiliary loss functions to optimize the model, which can capture the uncertainty of user behaviors and alleviate exposure bias. However, existing generative models still suffer from the posterior collapse problem or the model collapse problem, thus limiting their applications in sequential recommendation. To tackle the challenges mentioned above, we leverage a new paradigm of the generative models, i.e., diffusion models, and present sequential recommendation with diffusion models (DiffRec), which can avoid the issues of VAE- and GAN-based models and show better performance. While diffusion models are originally proposed to process continuous image data, we design an additional transition in the forward process together with a transition in the reverse process to enable the processing of the discrete recommendation data. We also design a different noising strategy that only noises the target item instead of the whole sequence, which is more suitable for sequential recommendation. Based on the modified diffusion process, we derive the objective function of our framework using a simplification technique and design a denoise sequential recommender to fulfill the objective function. As the lengthened diffusion steps substantially increase the time complexity, we propose an efficient training strategy and an efficient inference strategy to reduce training and inference cost and improve recommendation diversity. Extensive experiment results on three public benchmark datasets verify the effectiveness of our approach and show that DiffRec outperforms the state-of-the-art sequential recommendation models.
SINC: Spatial Composition of 3D Human Motions for Simultaneous Action GenerationAthanasiou, Nikos;Petrovich, Mathis;Black, Michael J.;Varol, Gül
doi: 10.48550/arxiv.2304.10417pmid: N/A
Abstract: Our goal is to synthesize 3D human motions given textual inputs describing simultaneous actions, for example 'waving hand' while 'walking' at the same time. We refer to generating such simultaneous movements as performing 'spatial compositions'. In contrast to temporal compositions that seek to transition from one action to another, spatial compositing requires understanding which body parts are involved in which action, to be able to move them simultaneously. Motivated by the observation that the correspondence between actions and body parts is encoded in powerful language models, we extract this knowledge by prompting GPT-3 with text such as "what are the body parts involved in the action <action name>?", while also providing the parts list and few-shot examples. Given this action-part mapping, we combine body parts from two motions together and establish the first automated method to spatially compose two actions. However, training data with compositional actions is always limited by the combinatorics. Hence, we further create synthetic data with this approach, and use it to train a new state-of-the-art text-to-motion generation model, called SINC ("SImultaneous actioN Compositions for 3D human motions"). In our experiments, that training with such GPT-guided synthetic data improves spatial composition generation over baselines. Our code is publicly available at this https URL.
Matching markets with farsighted couplesAtay, Ata;Funck, Sylvain;Mauleon, Ana;Vannetelbosch, Vincent
doi: 10.48550/arxiv.2304.12276pmid: N/A
Abstract: We adopt the notion of the farsighted stable set to determine which matchings are stable when agents are farsighted in matching markets with couples. We show that a singleton matching is a farsighted stable set if and only if the matching is stable. Thus, matchings that are stable with myopic agents remain stable when agents become farsighted. Examples of farsighted stable sets containing multiple non-stable matchings are provided for markets with and without stable matchings. For couples markets where the farsighted stable set does not exist, we propose the DEM farsighted stable set to predict the matchings that are stable when agents are farsighted.
Secret-Key-Agreement Advantage Distillation With Quantization CorrectionArdizzon, Francesco;Giurisato, Francesco;Tomasin, Stefano
doi: 10.48550/arxiv.2304.10312pmid: N/A
Abstract: We propose a novel advantage distillation strategy for physical layer-based secret-key-agreement (SKA). We consider a scenario where Alice and Bob aim at extracting a common bit sequence, which should remain secret to Eve, by quantizing a random number obtained from measurements at their communication channel. We propose an asymmetric advantage distillation protocol with two novel features: i) Alice quantizes her measurement and sends partial information on it over an authenticated public side channel, and ii) Bob quantizes his measurement by exploiting the partial information. The partial information on the position of the measurement in the quantization interval and its sharing allows Bob to obtain a quantized value closer to that of Alice. Both strategies increase the lower bound of the secret key rate.