XDC Network Assessment: Decentralization, Scalability and SecurityChakraborty, Mohuya; Khekade, Atul
doi: 10.48550/arxiv.2408.02318pmid: N/A
Abstract:XinFin, in 2019, unveiled the XDC network, an enterprise-ready hybrid blockchain platform that is open-source and specializes in tokenization for real-world decentralized finance. Overseeing the XDC network is currently the XDC Foundation, a non-profit organization established to encourage the growth, enhancement, and adoption of the XDC Network through community-driven projects such as GitHub. This whitepaper discusses the real-time assessment of the XDC network's decentralization, scalability, and security aspects as well as the Nakamoto coefficient estimation that follows, which is a measure of a decentralized system's decentralization nature that quantifies the minimal number of nodes or entities needed to compromise the system. A high coefficient denotes greater decentralization, while a low number denotes increased disruption risk. The XDC network's real-time computation of the high Nakamoto coefficient demonstrates its highly decentralized character. The article also addresses the diversity of consensus and execution clients, the host distribution, the geo-distribution, and some of the outstanding issues and business considerations.
PixelFade: Privacy-preserving Person Re-identification with Noise-guided Progressive ReplacementZhang, Delong; Peng, Yi-Xing; Wu, Xiao-Ming; Wu, Ancong; Zheng, Wei-Shi
doi: 10.48550/arxiv.2408.05543pmid: N/A
Abstract:Online person re-identification services face privacy breaches from potential data leakage and recovery attacks, exposing cloud-stored images to malicious attackers and triggering public concern. The privacy protection of pedestrian images is crucial. Previous privacy-preserving person re-identification methods are unable to resist recovery attacks and compromise accuracy. In this paper, we propose an iterative method (PixelFade) to optimize pedestrian images into noise-like images to resist recovery attacks. We first give an in-depth study of protected images from previous privacy methods, which reveal that the chaos of protected images can disrupt the learning of recovery models. Accordingly, Specifically, we propose Noise-guided Objective Function with the feature constraints of a specific authorization model, optimizing pedestrian images to normal-distributed noise images while preserving their original identity information as per the authorization model. To solve the above non-convex optimization problem, we propose a heuristic optimization algorithm that alternately performs the Constraint Operation and the Partial Replacement Operation. This strategy not only safeguards that original pixels are replaced with noises to protect privacy, but also guides the images towards an improved optimization direction to effectively preserve discriminative features. Extensive experiments demonstrate that our PixelFade outperforms previous methods in resisting recovery attacks and Re-ID performance. The code is available at this https URL.
Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech RecognizersSerai, Prashant; Wang, Peidong; Fosler-Lussier, Eric
doi: 10.48550/arxiv.2408.11258pmid: N/A
Abstract:Modeling the errors of a speech recognizer can help simulate errorful recognized speech data from plain text, which has proven useful for tasks like discriminative language modeling, improving robustness of NLP systems, where limited or even no audio data is available at train time. Previous work typically considered replicating behavior of GMM-HMM based systems, but the behavior of more modern posterior-based neural network acoustic models is not the same and requires adjustments to the error prediction model. In this work, we extend a prior phonetic confusion based model for predicting speech recognition errors in two ways: first, we introduce a sampling-based paradigm that better simulates the behavior of a posterior-based acoustic model. Second, we investigate replacing the confusion matrix with a sequence-to-sequence model in order to introduce context dependency into the prediction. We evaluate the error predictors in two ways: first by predicting the errors made by a Switchboard ASR system on unseen data (Fisher), and then using that same predictor to estimate the behavior of an unrelated cloud-based ASR system on a novel task. Sampling greatly improves predictive accuracy within a 100-guess paradigm, while the sequence model performs similarly to the confusion matrix.
Multi-User Mobile Augmented Reality for Cardiovascular Surgical PlanningMehta, Pratham; Narayanan, Rahul O; Karanth, Harsha; Yang, Haoyang; Slesnick, Timothy C; Shaw, Fawwaz; Chau, Duen Horng
doi: 10.48550/arxiv.2408.03249pmid: N/A
Abstract:Collaborative planning for congenital heart diseases typically involves creating physical heart models through 3D printing, which are then examined by both surgeons and cardiologists. Recent developments in mobile augmented reality (AR) technologies have presented a viable alternative, known for their ease of use and portability. However, there is still a lack of research examining the utilization of multi-user mobile AR environments to support collaborative planning for cardiovascular surgeries. We created ARCollab, an iOS AR app designed for enabling multiple surgeons and cardiologists to interact with a patient's 3D heart model in a shared environment. ARCollab enables surgeons and cardiologists to import heart models, manipulate them through gestures and collaborate with other users, eliminating the need for fabricating physical heart models. Our evaluation of ARCollab's usability and usefulness in enhancing collaboration, conducted with three cardiothoracic surgeons and two cardiologists, marks the first human evaluation of a multi-user mobile AR tool for surgical planning. ARCollab is open-source, available at this https URL.
Automation Configuration in Smart Home Systems: Challenges and OpportunitiesAnik, Sheik Murad Hassan; Gao, Xinghua; Zhong, Hao; Wang, Xiaoyin; Meng, Na
doi: 10.48550/arxiv.2408.04755pmid: N/A
Abstract:As the innovation of smart devices and internet-of-things (IoT), smart homes have become prevalent. People tend to transform residences into smart homes by customizing off-the-shelf smart home platforms, instead of creating IoT systems from scratch. Among the alternatives, Home Assistant (HA) is one of the most popular platforms. It allows end-users (i.e., home residents) to smartify homes by (S1) integrating selected devices into the system, and (S2) creating YAML files to control those devices. Unfortunately, due to the diversity of devices and complexity of automatic configurations, many users have difficulty correctly creating YAML files. Consequently, their smart homes may not work as expected, causing frustration and concern in users. This paper presents a novel study on issues of YAML-based automation configuration in smart homes (issues related to S2). We mined the online forum Home Assistant Community for discussion threads related to automation configuration. By manually inspecting 190 threads, we revealed 3 categories of concerns: implementation, optimization, and debugging. Under each category, we classified discussions based on the issue locations and technical concepts involved. Among debugging discussions, we further classified discussions based on users' resolution strategies; we also applied existing analysis tools to buggy YAML files, to assess the tool effectiveness. Our study reveals the common challenges faced by users and frequently applied resolution strategies. There are 129 (68%) examined issues concerning debugging, but existing tools can detect at most 14 issues and fix none. It implies that existing tools provide limited assistance in automation configuration. Our research sheds light on future directions in smart home development.
Approximation Algorithms for Anchored Multiwatchman RoutesMitchell, Joseph S. B.; Nguyen, Linh
doi: 10.48550/arxiv.2408.17343pmid: N/A
Abstract:We study some variants of the $k$-\textsc{Watchman Routes} problem, the cooperative version of the classic \textsc{Watchman Routes} problem in a simple polygon. The watchmen may be required to see the whole polygon, or some pre-determined quota of area within the polygon, and we want to minimize the maximum length traveled by any watchman. While the single watchman version of the problem has received much attention is rather well understood, it is not the case for multiple watchmen version. We provide the first tight approximability results for the anchored $k$-\textsc{Watchman Routes} problem in a simple polygon, assuming $k$ is fixed, by a fully-polynomial time approximation scheme. The basis for the FPTAS is provided by an exact dynamic programming algorithm. If $k$ is a variable, we give constant-factor approximations.
PRESENT: Zero-Shot Text-to-Prosody ControlLam, Perry; Zhang, Huayun; Chen, Nancy F.; Sisman, Berrak; Herremans, Dorien
doi: 10.48550/arxiv.2408.06827pmid: N/A
Abstract:Current strategies for achieving fine-grained prosody control in speech synthesis entail extracting additional style embeddings or adopting more complex architectures. To enable zero-shot application of pretrained text-to-speech (TTS) models, we present PRESENT (PRosody Editing without Style Embeddings or New Training), which exploits explicit prosody prediction in FastSpeech2-based models by modifying the inference process directly. We apply our text-to-prosody framework to zero-shot language transfer using a JETS model exclusively trained on English LJSpeech data. We obtain character error rates (CER) of 12.8%, 18.7% and 5.9% for German, Hungarian and Spanish respectively, beating the previous state-of-the-art CER by over 2x for all three languages. Furthermore, we allow subphoneme-level control, a first in this field. To evaluate its effectiveness, we show that PRESENT can improve the prosody of questions, and use it to generate Mandarin, a tonal language where vowel pitch varies at subphoneme level. We attain 25.3% hanzi CER and 13.0% pinyin CER with the JETS model. All our code and audio samples are available online.
LiveFC: A System for Live Fact-Checking of Audio StreamsV, Venktesh; Setty, Vinay
doi: 10.48550/arxiv.2408.07448pmid: N/A
Abstract:The advances in the digital era have led to rapid dissemination of information. This has also aggravated the spread of misinformation and disinformation. This has potentially serious consequences, such as civil unrest. While fact-checking aims to combat this, manual fact-checking is cumbersome and not scalable. While automated fact-checking approaches exist, they do not operate in real-time and do not always account for spread of misinformation through different modalities. This is particularly important as proactive fact-checking on live streams in real-time can help people be informed of false narratives and prevent catastrophic consequences that may cause civil unrest. This is particularly relevant with the rapid dissemination of information through video on social media platforms or other streams like political rallies and debates. Hence, in this work we develop a platform named LiveFC, that can aid in fact-checking live audio streams in real-time. LiveFC has a user-friendly interface that displays the claims detected along with their veracity and evidence for live streams with associated speakers for claims from respective segments. The app can be accessed at this http URL and a screen recording of the demo can be found at this https URL.
Submodular Maximization Approaches for Equitable Client Selection in Federated LearningJiménez, Andrés Catalino Castillo; Kaya, Ege C.; Ye, Lintao; Hashemi, Abolfazl
doi: 10.48550/arxiv.2408.13683pmid: N/A
Abstract:In a conventional Federated Learning framework, client selection for training typically involves the random sampling of a subset of clients in each iteration. However, this random selection often leads to disparate performance among clients, raising concerns regarding fairness, particularly in applications where equitable outcomes are crucial, such as in medical or financial machine learning tasks. This disparity typically becomes more pronounced with the advent of performance-centric client sampling techniques. This paper introduces two novel methods, namely SUBTRUNC and UNIONFL, designed to address the limitations of random client selection. Both approaches utilize submodular function maximization to achieve more balanced models. By modifying the facility location problem, they aim to mitigate the fairness concerns associated with random selection. SUBTRUNC leverages client loss information to diversify solutions, while UNIONFL relies on historical client selection data to ensure a more equitable performance of the final model. Moreover, these algorithms are accompanied by robust theoretical guarantees regarding convergence under reasonable assumptions. The efficacy of these methods is demonstrated through extensive evaluations across heterogeneous scenarios, revealing significant improvements in fairness as measured by a client dissimilarity metric.
FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across TokenizersWilliams, Joshua Nathaniel; Kolter, J. Zico
doi: 10.48550/arxiv.2408.04816pmid: N/A
Abstract:The widespread use of large language models has resulted in a multitude of tokenizers and embedding spaces, making knowledge transfer in prompt discovery tasks difficult. In this work, we propose FUSE (Flexible Unification of Semantic Embeddings), an inexpensive approach to approximating an adapter layer that maps from one model's textual embedding space to another, even across different tokenizers. We introduce a third-order tensor-based representation of a model's embedding space that aligns semantic embeddings that have been split apart by different tokenizers, and use this representation to derive an approximation of the gradient of one model's outputs with respect to another model's embedding space. We show the efficacy of our approach via multi-objective optimization over vision-language and causal language models for image captioning and sentiment-based image captioning.