Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysisSeshadri, Karthick; Iyer, K. Viswanathan; S, Mercy Shalinie
doi: 10.1002/cpe.5094pmid: N/A
We propose a parallel generalization scheme for Singular Value Decomposition–based clustering algorithms. The scheme enables the clustering algorithm to generate a hierarchy of clusters instead of a flat set of clusters. The generalization scheme infers the number of levels to be formed and the number of clusters per level of the hierarchy automatically without depending on any user‐supplied parameter. The performance of the suggested hierarchical clustering algorithm was evaluated using the web directory taxonomy hosted by the Open Directory DMOZ. Empirical evaluations and statistical tests reveal that the proposed generalization scheme produces a superior cluster hierarchy when compared with two existing generalization techniques in terms of the precision, recall, f‐measure, and the rand index. The generalization scheme is well‐equipped to deal with large datasets and the speed‐up achieved by the parallelized generalization scheme over its sequential variant was measured using a multicore computer.
Preventing phishing attacks using text and image watermarkingHajiali, Mahdi; Amirmazlaghani, Maryam; Kordestani, Hossain
doi: 10.1002/cpe.5083pmid: N/A
Phishing is a form of identity theft used to illegally gather personal and financial information by providing fake web pages designed to mimic the website of a legitimate business. In this paper, we propose a novel anti‐phishing approach based on using imperceptible watermarking and monitoring the HTTP requests. In the server‐side, a URL dependent watermark is generated and embedded in the elements of the web page, which are logo image and HTML/CSS files. In the client side, to authenticate the website and check its safety, the watermark is regenerated and compared with the extracted watermarks from the elements of the webpage. Security analysis of the proposed method demonstrates its robustness against some common attacks. The proposed method performs fully automatically and without user interaction. It can be simply implemented and used for all kinds of websites, either secured or not secured with SSL protocol.
Rapid and accurate energy models through calibration with IPMI and RAPLKavanagh, Richard; Djemame, Karim
doi: 10.1002/cpe.5124pmid: N/A
Energy consumption in Cloud and High Performance Computing platforms is a significant issue and affects aspects such as the cost of energy and the cooling of the data center. Host level monitoring and prediction provides the groundwork for improving energy efficiency through the placement of workloads. Monitoring must be fast and efficient without unnecessary overhead, to enable scalability. This precludes the use of Watt meters attached per host, requiring alternative approaches such as integrated measurements and models. IPMI and RAPL are subject to error and partial measurement, which may be mitigated. Models allow for prediction and more responsive measures of power consumption, but require calibrating. The causes of calibration error are discussed, along with mitigation strategies, without overly complicating the underlying model. An outcome is a Watt meter emulator that provides hosts level power measurement along with estimated power consumption for a given workload, with an average error of 0.20W.
Algorithms for in‐place matrix transpositionGustavson, Fred G.; Walker, David W.
doi: 10.1002/cpe.5071pmid: N/A
This paper presents implementations of in‐place algorithms for transposing rectangular matrices. One implementation is a swap‐based algorithm described by Tretyakov and Tyrtyshnikov,1 to which we have introduced a number of variations. In particular, we show how the original algorithm can be modified to require constant additional memory. A proof of correctness is also sketched. This algorithm is compared with cycle‐following approaches and with the swap‐based GCD Transpose algorithm that partitions the matrix into a hierarchy of square submatrices. The performance of parallel implementations on a multicore system is also investigated.
A novel design of multiplexer based on nano‐scale quantum‐dot cellular automataMosleh, Mohammad
doi: 10.1002/cpe.5070pmid: N/A
Quantum‐dot Cellular Automata (QCA) is known as one of best alternative technologies for CMOS, on nano‐scale dimensions, which allows digital circuits to be designed with high speed and density. In this paper, a novel 2:1 multiplexer in QCA technology is proposed. This proposed QCA multiplexer uses a new formulation based on an introduced MV32 gate. The MV32 gate has three inputs and two outputs and operates based on cell interactions. A multi‐layer design for proposed 2:1 QCA multiplexer is provided. The simulation results obtained by the software, QCADesigner 2.0.3, confirm that the proposed multiplexer works well and can be applied as a high performance design, in QCA technology. Moreover, the 2:1 QCA multiplexer is used to create 4:1 and 8:1 QCA multiplexers. The results of comparisons show that the proposed structures have so better performance than many previous designs.
High‐performance SIMD implementation of the lattice‐Boltzmann method on the Xeon Phi processorRobertsén, Fredrik; Mattila, Keijo; Westerholm, Jan
doi: 10.1002/cpe.5072pmid: N/A
We present a high‐performance implementation of the lattice‐Boltzmann method (LBM) on the Knights Landing generation of Xeon Phi. The Knights Landing architecture includes 16GB of high‐speed memory (MCDRAM) with a reported bandwidth of over 400 GB/s, and a subset of the AVX‐512 single instruction multiple data (SIMD) instruction set. We explain five critical implementation aspects for high performance on this architecture: (1) the choice of appropriate LBM algorithm, (2) suitable data layout, (3) vectorization of the computation, (4) data prefetching, and (5) running our LBM simulations exclusively from the MCDRAM. The effects of these implementation aspects on the computational performance are demonstrated with the lattice‐Boltzmann scheme involving the D3Q19 discrete velocity set and the TRT collision operator. In our benchmark simulations of fluid flow through porous media, using double‐precision floating‐point arithmetic, the observed performance exceeds 960 million fluid lattice site updates per second.
RT‐JADE: A preemptive real‐time scheduling middleware for mobile agentsFilgueiras, Tatiana Pereira; Rodrigues, Leonardo M.; Oliveira Rech, Luciana; Souza, Luciana Moreira Sá; Netto, Hylson Vescovi
doi: 10.1002/cpe.5061pmid: N/A
Mobile agents are examples of distributed systems which may dispute for the same resources on their hosts. Treating such concurrency adequately is essential, particularly in real‐time applications. Due to intrinsic time restrictions, mobile agents in real‐time environments are only considered successful if they fulfill their mission by respecting their deadlines. Scheduling algorithms with different policies can be applied in these scenarios. However, the efficiency of these algorithms may deviate according to the missions and deadlines of the mobile agents. Also, these algorithms can be preemptive, or calculate the order of executions without interrupting an ongoing task. In this paper, we propose a middleware extension to the JADE platform that brings real‐time scheduling support with preemption to mobile agents. The proposed solution uses best effort scheduling policy in the context of soft real‐time applications. We evaluate the performance of the scheduling algorithms, with and without preemption, and the impact of the selected algorithms on mission fulfillment. The results of the proposed middleware showed a great improvement on mission accomplishment when compared to the FIFO algorithm provided by the JADE platform.
A possibilistic framework for the detection of terrorism‐related Twitter communities in social mediaMoussaoui, Mohamed; Zaghdoud, Montaceur; Akaichi, Jalel
doi: 10.1002/cpe.5077pmid: N/A
Since the appearance of social networks, there was a historic increase of data. Unfortunately, terrorists are taking advantage of the easiness of accessing social networks and they have set up profiles to recruit, radicalize, and raise funds. Most of these profiles have pages that exist as well as new recruits to join the terrorist groups, see, and share information. Therefore, there is a potential need for detecting terrorist communities in social networks in order to search for key hints in posts that appear to promote the militants' cause. In order to remedy this problem, we first use a possibilistic‐clustering algorithm that allows more flexibility when assigning a social network profile to clusters (non‐terrorist, terrorist‐sympathizer, terrorist). Then, we introduce a new possibilistic flexible graph mining method to discover similar subgraphs by applying possibilistic similarity rather than using hard structural exact similarity. We experimentally show the efficiency of our possibilistic approach through a detailed process of tweets extract, semantic processing, and classification of the community detection.
A user‐assisted thread‐level vulnerability assessment toolOz, Isil; Topcuoglu, Haluk Rahmi; Tosun, Oguz
doi: 10.1002/cpe.5085pmid: N/A
The system reliability becomes a critical concern in modern architectures with the scale down of circuits. To deal with soft errors, the replication of system resources has been used at both hardware and software levels. Since the redundancy causes performance degradation, it is required to explore partial redundancy techniques that replicate the most vulnerable parts of the code. The redundancy level of user applications depends on user preferences and may be different for the users with different requirements. In this work, we propose a user‐assisted reliability assessment tool based on critical thread analysis for redundancy in parallel architectures. Our analysis evaluates the application threads of a parallel program by considering their criticality in the execution and selects the most critical thread or threads to be replicated. Moreover, we extend our analysis by exploring critical regions of individual threads and execute redundantly only those regions to reduce redundancy overhead. Our experimental evaluation indicates that the replication of the most critical thread improves the system reliability more (up to 10% for blackscholes application) than the replication of any other thread. The partial thread replication based on critical region analysis also reduces the vulnerability of the system by considering a fine‐grained approach.