Barcodes as Summary of Loss Function TopologyBarannikov, Serguei; Korotin, Alexander; Oganesyan, Dmitry; Emtsev, Daniil; Burnaev, Evgeny
doi: 10.1134/s1064562423701570pmid: N/A
Abstract:We propose to study neural networks' loss surfaces by methods of topological data analysis. We suggest to apply barcodes of Morse complexes to explore topology of loss surfaces. An algorithm for calculations of the loss function's barcodes of local minima is described. We have conducted experiments for calculating barcodes of local minima for benchmark functions and for loss surfaces of small neural networks. Our experiments confirm our two principal observations for neural networks' loss surfaces. First, the barcodes of local minima are located in a small lower part of the range of values of neural networks' loss function. Secondly, increase of the neural network's depth and width lowers the barcodes of local minima. This has some natural implications for the neural network's learning and for its generalization properties.
Subspace power method for symmetric tensor decompositionKileel, Joe; Pereira, João M.
doi: 10.48550/arxiv.1912.04007pmid: N/A
Abstract:We introduce the Subspace Power Method (SPM) for calculating the CP decomposition of low-rank real symmetric tensors. This algorithm calculates one new CP component at a time, alternating between applying the shifted symmetric higher-order power method (SS-HOPM) to a certain modified tensor, constructed from a matrix flattening of the original tensor; and using appropriate deflation steps. We obtain rigorous guarantees for SPM regarding convergence and global optima for input tensors of dimension $d$ and order $m$ of CP rank up to $O(d^{\lfloor m/2\rfloor})$, via results in classical algebraic geometry and optimization theory. As a by-product of our analysis we prove that SS-HOPM converges unconditionally, settling a conjecture in [Kolda, T.G., Mayo, J.R.: Shifted power method for computing tensor eigenpairs. SIAM Journal on Matrix Analysis and Applications 32(4), 1095-1124 (2011)]. We present numerical experiments which demonstrate that SPM is efficient and robust to noise, being up to one order of magnitude faster than state-of-the-art CP decomposition algorithms in certain experiments. Furthermore, prior knowledge of the CP rank is not required by SPM.
Generalized Reputation Computation Ontology and Temporal Graph ArchitectureKolonin, Anton
doi: 10.48550/arxiv.1912.00176pmid: N/A
Abstract:The problem of reliable democratic governance is important for survival of any community, and it will be more critical over time communities with levels of social connectivity in society rapidly increasing with speeds and scales of electronic communication. In order to face such challenge, different sorts of rating and reputation systems are being developed, however reputation gaming and manipulation in such systems appears to be serious problem. We are considering use of advanced reputation system supporting "liquid democracy" principle with generalized design and underlying ontology fitting different sorts of environments such as social networks, financial ecosystems and marketplaces. The suggested system is based on "temporal weighted liquid rank" algorithm employing different sorts of explicit and implicit ratings being exchanged by members of the society. For the purpose, we suggest "incremental reputation" design and graph database used for implementation of the system. Finally, we present evaluation of the system against real social network and financial blockchain data. The entire framework is expected to be the foundation of any multi-agent AI framework, so the evolution of distributed multi-agent AI architecture and dynamics will be based on the organic reputation scores earned by the agents that are part of it.
Mixability of Integral Losses: a Key to Efficient Online Aggregation of Functional and Probabilistic ForecastsKorotin, Alexander; V'yugin, Vladimir; Burnaev, Evgeny
doi: 10.1016/j.patcog.2021.108175pmid: N/A
Abstract:In this paper we extend the setting of the online prediction with expert advice to function-valued forecasts. At each step of the online game several experts predict a function, and the learner has to efficiently aggregate these functional forecasts into a single forecast. We adapt basic mixable (and exponentially concave) loss functions to compare functional predictions and prove that these adaptations are also mixable (exp-concave). We call this phenomenon mixability (exp-concavity) of integral loss functions. As an application of our main result, we prove that various loss functions used for probabilistic forecasting are mixable (exp-concave). The considered losses include Sliced Continuous Ranked Probability Score, Energy-Based Distance, Optimal Transport Costs and Sliced Wasserstein-2 distance, Beta-2 and Kullback-Leibler divergences, Characteristic function and Maximum Mean Discrepancies.
A flexible FPGA accelerator for convolutional neural networksMajumder, Kingshuk; Nema, Shubham; Bondhugula, Uday
doi: 10.48550/arxiv.1912.07284pmid: N/A
Abstract:Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to exploit all forms of reuse available to minimize off-chip memory access while increasing utilization of available resources. The proposed design is composed of cores, each of which contains a one-dimensional array of processing elements. These cores can exploit different types of reuse available in CNN layers of varying shapes without requiring any reconfiguration; in particular, our design minimizes underutilization due to problem sizes that are not perfect multiples of the underlying hardware array dimensions. A major obstacle in the adoption of FPGAs as a platform for CNN inference is the difficulty to program these devices using hardware description languages. Our end goal is to also address this, and we develop preliminary software support via a codesign in order to leverage the accelerator through TensorFlow, a dominant high-level programming model. Our framework takes care of tiling and scheduling of neural network layers and generates necessary low-level commands to execute the CNN. Experimental evaluation on a real system with a PCI-express based Xilinx VC709 board demonstrates the effectiveness of our approach. As a result of an effective interconnection, the design maintains a high frequency when we scale the number of PEs. The sustained performance overall is a good fraction of the accelerator's theoretical peak performance.
Bounded Languages Described by GF(2)-grammarsMakarov, Vladislav
doi: 10.48550/arxiv.1912.13401pmid: N/A
Abstract:GF(2)-grammars are a recently introduced grammar family with some unusual algebraic properties. They are closely connected to unambiguous grammars. By using the method of formal power series, we establish strong conditions that are necessary for subsets of a^* b^* and a^* b^* c^* to be described by some GF(2)-grammar. By further applying the established results, we settle the long-standing open question of proving inherent ambiguity of the language {a^n b^m c^k | n != m or m != k}$, as well as give a new purely algebraic proof of the inherent ambiguity of the language {a^n b^m c^k}{n = m or m = k}.