Welcome to the web site of the High Performance Computing and Networking Lab. We are located at the Al-Khawarizmi Institute of Computer Science (KICS) at the University of Engineering and Technology (UET), Lahore, PAKISTAN. Realizing the importance of high performance computing and networking, it was decided to setup a research lab to cope with challenges and dynamics of the field. The Lab was inaugurated by the Chairman Board of Governors KICS and Vice Chancellor of UET, Lt. Gen. (R) Muhammad Akram Khan together with Dr. Qasim Shiekh, CEO National ICT R&D Fund, on 16th June 2008.
To become a leading force in the field of high performance computing and networking and to become a pioneer in the research of high speed computing.
The High Performance Computing and Networking Lab performs research in a number of Areas, such as performance characterization, evaluation and benchmarking of multi-core processor based systems and high speed networks, development of micro-kernel benchmarks, virtualization, optimized application parallelization, distributed content searching and web indexing. Our lab started in 2006 as a research group with minimal research facilities and since then has grown into a state of the art research lab with a number of research staff, postgraduate and undergraduate students.
This lab provides a platform to hone the research skills of individuals. This lab provides opportunity to undergraduate students to familiarize them with multi-core processor based systems by supervising them in their final year projects. We provide workshops and short courses on a yearly basis on different issues related to high performance computing and performance engineering. We also offer internships to undergraduates and fresh graduates. Seeing the potential of research in this area, we plan to offer MS/PhD theses to post graduates in the future.
This lab aims to provide active research in area of high performance computing and networking, performance evaluation and benchmarking, development of performance measuring and profiling tools, parallel and distributed systems, and grid and cluster computing.
In future we also aim to provide complete performance engineering solutions to network infrastructure products and application servers. We plan to develop customized tools and benchmarks for this purpose. We aim to provide complete solutions to our client's performance engineering needs.
Multiple cores per chip have emerged as architecture of choice to allow continued benefits due to Moore's Law. However, this choice brings along some of the unique challenges to memory architecture design and software development to the main-stream that were once hallmarks of high-end, high-performance, and high-price parallel computing. HPCNL team kept their focus on benchmarking, performance evaluation/characterization, and tuning on leading multi-core processors based systems from Intel, AMD, IBM, Nvidia and Cavium Networks. The team developed Multicore Processor Architecture and Communication (MPAC) benchmarking library. MPAC is an open source (http://mpac.sourceforge.net/) benchmarking infrastructure for developing specification-driven multithreaded benchmarks for multicore systems. Using MPAC library, following benchmarks have been developed so far.
- MPAC CPU Benchmark
- MPAC Memory Benchmark
- MPAC Network Benchmark
- MPAC Cache Benchmark
- MPAC Disk Benchmark
- MPAC Cell BE benchmarks (CPU & Memory)
- MPAC CUDA benchmarks (Compute & Memory)
- MPAC Android Benchmarks (in progress)
-
Availability of hardware performance counters on modern architectures made it possible to characterize the workload for potential optimizations. HPCNL staff has written code to get the performance counter data for x86 and MIPS in user space. They have also written a kernel module to get performance counter data for ARM processor, powered by Android. A study, based on data gathered by these utilities, is in progress to compare different machines.
Dynamic parallelization deals with parallel execution of sequential code with the help of a hardware/software runtime system. One of the staff members of HPCNL is doing his PhD dissertation in this area. He has recently developed an automated tool, named SeekBin. SeekBin is a Java agent that hooks to class loader of Java Virtual Machine (JVM) and analyzes the loading classes for parallelization potential. Runtime modification of Java classes for parallel execution is in progress.
Graphic processing units (GPUs) are increasingly being employed as commodity data-parallel co-processors in desktop and laptop systems due to their tremendous computational power and high memory bandwidth. HPCNL team members ported MPAC CPU and memory benchmarks to Nvidia’s Compute Unified Device Architecture (CUDA) and evaluated the potential of general purpose computing of Nvidia’s GPUs. The evaluation involved floating point, integer and logical functional units as well as the interaction between different on-chip and off-chip memories available on the GPUs. The experiences, results and conclusions are summarized in a detailed technical repost (available on KICS website).
This work involved porting and extending MPAC for multicore smart phones. The specifications and requirements for smart phone embedded platforms were laid out. Android and iphone iOS were selected as the prime candidates for MPAC extension. Firstly, the work involved porting MPAC to ARM processor based android platforms. The porting involved removing bugs/incompatibilities for compilation and correct execution of MPAC benchmarks for android platforms. A Graphical User Interface (GUI) for android platform was built which output entire smart phone specifications for CPU, memory, battery, sensors and I/O, including number of cores, CPU frequency, available/used memory, type of sensors, battery charge state, temperature and terminal voltage etc. The GUI provides all the run options in a convenient form for executing MPAC benchmarks. Furthermore, it displays the results in numeric as well as bar graph form. Secondly, the work involved porting MPAC to MAC OS X with the eventual goal of porting and running it on iphone iOS, since iphone iOS uses the same core libraries from MAC OS X. The porting effort involved removing incompatibilities, adding different libraries and functions for MAC OS X, removing existing variables and adding new ones. Also a GUI was developed for iphone that includes specifications for CPU, memory, battery and I/O with ultimate goal of extending GUI to include ported MPAC benchmarks. Also MPAC was extended with additional kernel workloads/applications which include matrix multiplication, wavelet transform and Fourier transform algorithms for signal processing. These kernel workload/application extensions are however an ongoing work. The goal is to choose characteristic smart phone workloads and eventually port these to smart phone platforms.
Cloud computing is a re-incarnation of a number of similar efforts in distributed computing with one difference: virtualization. While virtualization is also a familiar concept in Computer Science, its current adoption for physical compute resources creates a potentially disruptive business model for IT applications. Virtualization thrives on multi-core processors based server architectures for data center applications. Memory throughput bottleneck impacts interactions among Virtual Machines (VMs). VM-VM interactions on same physical host are in many ways analogous to bridging multiple networks. Thus, challenges such as isolation, security, policy enforcements, and QoS guarantees, are relevant in this domain. HPCN team is currently engaged in analyzing multiple Xen based technologies for VM-VM interactions on multi-core systems. Any collaboration in virtualization will benefit from our work on multi-core processor architectures and network platform.
HPCNL team analyzed virtual machine (VM) scalability on multi-core systems for compute-, memory-, and network I/O-intensive workloads. The VM scalability evaluation under these three workloads will help cloud users to understand the performance impact of underlying system and network architectures. We demonstrate that VMs on the state-of-the-art multi-core processor based systems scale as well as multiple threads on native SMP kernel for CPU and memory intensive workloads. Intra-VM communication of network I/O intensive TCP message workload has a lower overhead compared to multiple threads when VMs are pinned to specific cores. However, VM scalability is severely limited for such workloads for across-VM communication on a single host due to virtual bridges. For across local and wide area network communication, the network bandwidth is the limiting factor. Unlike previous studies that use workload mixes, we apply a single workload type at a time to clearly attribute VM scalability bottlenecks to system and network architectures or virtualization itself.
HPCNL team have published a paper that presents their experience of developing Octeon MIPS64r2 ‘User Mode Emulation’ (UME) support into open source ‘Quick Emulator’ (QEMU). QEMU can emulate numerous target architectures. Like many other open source projects, available documentation of the software is either scant or stale. Modifying and extending such code becomes especially challenging due to the sheer size of the code base (654K lines of code spread over 1251 source files). Sporadic or no developer support makes things even more challenging. Therefore, a team of developers is effectively left with the source code to understand it and to correctly change it without causing any regression bugs. The team overcame this challenge using methodical software engineering techniques. The paper discusses various problems that are encountered and solutions that are employed. In addition, it presents QEMU’s software architecture, which was constructed in a bottom-up manner using source code. Such experiences are relevant for understanding and extending any software of substantial size.
HPCN team is leveraging from open source Hadoop file system and Map-Reduce based distributed processing paradigm for this purpose. The team is open to specific product ideas and joint efforts to build on our current set up.
The website census was an effort to enumerate all the websites on the World Wide Web (WWW) without using crawling. Crawling is a traditional way of website discovery. It is conceptually simple but the very size of the WWW makes the implementation complex and resource demanding. The enormous amount of bandwidth, a huge persistent storage pool, a sufficiently large cluster of machines for data processing and a complex set of software systems are just a few examples of the needed resources. The team used exhaustive IP range probing to detect the presence of a web server on TCP port 80. Although this probing was exhaustive in nature, it is lightweight in terms of resource demands. This enumeration of websites has many applications. The most obvious is to use it as a seed to the conventional crawling. It can be refined to be used as a top level domain (TLD) specific seed for targeted crawling.
Design and development of Urdu search engine in another in-progress effort. It is being carried out in collaboration with Center of Language Engineering (CLE) which is a KICS associate that works on local languages especially Urdu.
Multi-core processors enable greater level of "intelligence" in network devices. Network devices can use additional computing capabilities to look deeper into packet content and to implement application-level visibility and policy enforcements. HPCN team has worked on a generic packet capture and re-insertion platform where such networking applications can be plugged in with minimal performance overhead.
A group of HPCNL team has excellent expertise in Linux kernel development. They have hands-on experience of bare metal application development and deployment. For example, the team has across the board experience on Octeon evaluation boards developing boot loader for these boards and system and application level development for the Linux running on these boards. People at HPCNL have wide experience of both native and cross compilation of Linux kernel
HPCNL works in collaboration with different national and international organizations. The collaborations done so far (or in progress) includes following organizations.
- Cisco Systems
- Cavium networks
- MontaVista
- Center of Language Engineering (CLE)
- Wi-tribe ACS optimization
HPCNL developed a course titles “Multicore Programming” for multiple universities around the world through Cavium University Program. Cavium University Program is defined and adopted by distinguished professors from University of California (Irvine), University of Minnesota, Purdue University, Ozyegin University (Istanbul Turkey), University of Michigan and San Jose State University.
The course material includes source code, lecture slides, lab exercises (and lab manual) on following topics:
- Performance Measurement
- Sorting
- IP Packet Sniffer
- Network Packet Filtering
- Deep Packet Inspection
The material is made available on their website (http://university.caviumnetworks.com/download_center.html).
HPCNL team also developed three courses for MontaVista University Program namely
- Understanding the Linux Kernel TCP/IP Stack
- Real Time Operating system (RTOS)
- Threading Building Block (TBB)
HPCNL involves the industry and academia by organizing workshops on state-of-the-art technologies. It also offers various short courses on emerging technologies and paradigms. Listed below are the workshops and courses conducted so far.
Programming Multi-Core Processor based Embedded, Mobile, and Distributed Systems—A Hands-On Approach
This short but extensive course covered state-of-the-art technologies and programming environments for leading multicore processors: Cavium Networks’ Octeon Processor using Octeon SDK, ARM multicore processors based mobile devices using Android, General Purpose Nvidia GPUs using CUDA, IBM Cell Processor using Cell-SDK, Cloud Computing using Hadoop/MapReduce, and Virtualization using KVM hypervisor. The topics included (1) Overview and taxonomy of multi-core processor architectures (2) Octeon processor and Cavium SDK (3) Linux kernel for multicore embedded systems (4) Real time OS (RTOS) (5) Performance measurement and tuning (6) Multithreaded Android development (7) Parrallel Computing on Sony Playstation 3 (8) Introduction to CUDA on multi-core GPU’s, and (9) Introduction to cloud computing and virtualization.
4th International Conference on Open-Source Systems and Technologies (ICOSST-2010), Lahore, Pakistan
HPCNL staff has been actively contributing to the success of ICOSST conferences, in the form of paper reviews, invited talks, workshops, session chairs etc. In ICOSST 2010, HPCNL’s consultant Dr. Abdul Waheed (Senior Performance Engineer, Cisco Systems USA and adjunct professor UET Lahore Pakistan) delivered an invited talk titled “Multicore System and Application Performance Engineering”. Similarly, two workshops on topics (1) Linux & Memory Maps, and (2) Exploring Linux Kernel the Easy Way, were also organized by HPCNL team members.
Parallel Programming Paradigms for Multi-Core Processors Based Systems—A Hands-On Experience on Intel and Sony PS 3 Platforms
This short course focused on state-of-the-art programming paradigms that are useful for software development for multi-core architectures: explicit message-passing, shared address space based computing, multi-threading, map-reduce, and Compute Unified Device Architecture (CUDA). It provided a brief introduction to each of these paradigms with emphasis on hands-on experience to enable students to become familiar with them for developing parallel applications. Students had an opportunity to learn parallel application development on Intel quad-core processor based servers and Sony Playstation 3 games console as target multi-core platform. The session included (1) Overview of parallel computing (2) An overview of explicit message-passing, shared address space based computing, and multi-threading on Intel's multi-core systems and Sony Playstation 3 (3) Introduction to MapReduce, and (4) Introduction to CUDA on multi-core processors.