High Performance Computing for Imaging 2023
Monday 16 January 2023
AI Methods for Imaging (M1)
Session Chair:
Muralikrishnan Gopalakrishnan Meena, Oak Ridge National Laboratory (United States)
9:05 – 10:20 AM
Mason
9:05
Conference Welcome
9:10HPCI-227
HPC+AI-enabled real-time nanoscale x-ray imaging (Invited), Mathew Cherukara, Argonne National Laboratory (United States) [view abstract]
The capabilities provided by next generation light sources such as the APSU along with the development of new characterization techniques and detector advances are expected to revolutionize materials characterization (metrology) by providing the ability to perform scale-bridging, multi-modal materials characterization under in-situ and operando conditions. For example, providing the ability to image in 3D large fields of view (~mm3) at high resolution (<10 nm), while simultaneously acquiring information about structure, strain, elemental composition, oxidation state, photovoltaic response etc. However, these novel capabilities dramatically increase the complexity and volume of data generated by instruments at the new light sources. Conventional data processing and analysis methodologies become infeasible in the face of such large and varied data streams. The use of AI/ML methods is becoming indispensable for real-time analysis, data abstraction and decision making at advanced synchrotron light sources such as the APS. I will describe the use high-performance computing (HPC) along with AI on edge devices to enable real-time analysis of streaming data from x-ray imaging instruments at the APS.
9:40HPCI-228
Physics guided machine learning for image-based material decomposition of tissues from simulated breast models with calcifications, Muralikrishnan Gopalakrishnan Meena1, Amir K. Ziabari1, Singanallur Venkatakrishnan1, Isaac R. Lyngaas1, Matthew R. Norman1, Balint Joo1, Thomas L. Beck1, Charles A. Bouman2, Anuj Kapadia1, and Xiao Wang1; 1Oak Ridge National Laboratory and 2Purdue University (United States) [view abstract]
Material decomposition of Computed Tomography (CT) scans using projection-based approaches, while highly accurate, poses a challenge for medical imaging researchers and clinicians due to limited or no access to projection data. We introduce a deep learning image-based material decomposition method guided by physics and requiring no access to projection data. The method is demonstrated to decompose tissues from simulated dual-energy X-ray CT scans of virtual human phantoms containing four materials - adipose, fibroglandular, calcification, and air. The method uses a hybrid unsupervised and supervised learning technique to tackle the material decomposition problem. We take advantage of the unique X-ray absorption rate of calcium compared to body tissues to perform a preliminary segmentation of calcification from the images using unsupervised learning. We then perform supervised material decomposition using a deep learned UNET model which is trained using GPUs in the high-performant systems at the Oak Ridge Leadership Computing Facility. The method is demonstrated on simulated breast models to decompose calcification, adipose, fibroglandular, and air.
10:00HPCI-229
WearMask: Fast in-browser face mask detection with serverless edge computing for COVID-19, Zekun Wang1, Pengwei Wang2, Peter C. Louis3, Lee E. Wheless3, and Yuankai Huo1; 1Vanderbilt University (United States), 2Shandong University (China), and 3Vanderbilt University Medical Center (United States) [view abstract]
The COVID-19 epidemic has been a significant healthcare challenge in the United States. COVID-19 is transmitted predominately by respiratory droplets generated when people breathe, talk, cough, or sneeze. Wearing a mask is the primary, effective, and convenient method of blocking 80% of respiratory infections. Therefore, many face mask detection systems have been developed to supervise hospitals, airports, publication transportation, sports venues, and retail locations. However, the current commercial solutions are typically bundled with software or hardware, impeding public accessibility. In this paper, we propose an in-browser serverless edge-computing-based face mask detection solution, called Web-based efficient AI recognition of masks (WearMask), which can be deployed on common devices (e.g., cell phones, tablets, computers) with internet connections using web browsers. The serverless edge-computing design minimizes the hardware costs (e.g., specific devices or cloud computing servers). It provides a holistic edge-computing framework for integrating (1) deep learning models (YOLO), (2) high-performance neural network inference computing framework (NCNN), and (3) a stack-based virtual machine (WebAssembly). For end-users, our solution has advantages of (1) serverless edge-computing design with minimal device limitation and privacy risk, (2) installation-free deployment, (3) low computing requirements, and (4) high detection speed. Our application has been launched with public access at facemask-detection.com.
10:20 – 10:50 AM Coffee Break
Ptychographic Imaging (M2)
Session Chair:
Xiao Wang, Oak Ridge National Laboratory (United States)
10:50 AM – 12:10 PM
Mason
10:50HPCI-230
High-performance ptychographic reconstruction using GPUs (Invited), Tekin Bicer, Argonne National Laboratory (United States) [view abstract]
Ptychography is a non-invasive coherent x-ray imaging technique that is used at synchrotron light sources to study a variety of materials at extremely high spatial resolutions, including functional, structural, biological, and energy materials. Ptychography experiments can generate data at high rates over extended periods, therefore the accumulated datasets (or diffraction patterns) can be extremely large, reaching tens of TBs. Further, ptychography experiments can be extended to 3D by rotating the sample and repeating 2D data acquisition, which can increase the dataset sizes by orders of magnitude. Ptychography is anticipated to be a common imaging technique for the next generation of light sources as the brightness of these scientific instruments is increased, including upgraded Advanced Photon Source. The dataset sizes are expected to follow the increases in beam brightness, reaching PBs of experimental data for extreme experiments, e.g., imaging an integrated circuit sample with <10 nm resolution on cm^2 area. Such extreme-scale datasets will require supercomputer-scale compute resources and efficient data analysis techniques for reasonable processing times (hours). Ptychographic reconstruction is the process of transforming a set of 2D diffraction pattern data into a 2D real-space image. This process is typically data-intensive and time consuming, requiring hundreds of iterations over the input dataset and the reconstructed image. High-performance parallel and distributed execution of ptychographic reconstruction operations is critical while dealing with tens of TBs (to potentially PBs) experimental datasets. GPUs have been traditionally used for reconstructing images from ptychography datasets. Although the state-of-the-art techniques are sufficient to analyze medium-scale datasets, large-scale experimental datasets require advanced parallelization techniques to further benefit from hw/sw infrastructure and accelerate analysis. In this talk, I will first introduce an optimized intranode multi-GPU implementation that can efficiently solve large-scale ptychographic reconstruction problems. We'll focus on the parallelization of the maximum likelihood reconstruction problem using a conjugate gradient method and propose a novel hybrid parallelization method to address the performance bottlenecks. In the second part of the talk, I'll present our topology-aware design and optimizations to address inter-process communication bottlenecks during ptychographic reconstruction. Specifically, we provide application-specific optimizations to accelerate ptychographic image reconstruction on platforms with a heterogeneous multi-GPU topology. We address the mismatch between commonly used communication topologies and heterogeneous hardware topologies by identifying an efficient data partitioning scheme, introducing a hybrid communication topology, and incorporating asynchronous execution. We will consider data transformation, partitioning, distribution, and their resulting communication patterns among GPUs, e.g. synchronized halo regions among neighboring GPUs/tasks. I will show the evaluation of our design and optimizations using real and artificial datasets and compare their performance with that of the direct P2P and NCCL-based approaches.
11:20HPCI-231
Image gradient decomposition for parallel and memory-efficient ptychographic reconstruction (Invited), Xiao Wang, Oak Ridge National Laboratory (United States) [view abstract]
Ptychography is a popular microscopic imaging modality for many scientific discoveries and sets the record for highest image resolution. Unfortunately, the high image resolution for ptychographic reconstruction requires significant amount of memory and computations, forcing many applications to compromise their image resolution in exchange for a smaller memory footprint and a shorter reconstruction time. In this paper, we propose a novel image gradient decomposition method that significantly reduces the memory footprint for ptychographic reconstruction by tessellating image gradients and diffraction measurements into tiles. In addition, we propose a parallel image gradient decomposition method that enables asynchronous point-to-point communications and parallel pipelining with minimal overhead on a large number of GPUs. Our experiments on a Titanate material dataset (PbTiO3) with 16632 probe locations show that our Gradient Decomposition algorithm reduces memory footprint by 51 times. In addition, it achieves time-to-solution within 2.2 minutes by scaling to 4158 GPUs with a super-linear strong scaling efficiency at 364% compared to runtimes at 6 GPUs. This performance is 2.7 times more memory efficient, 9 times more scalable and 86 times faster than the state-of-the-art algorithm.
11:50HPCI-232
AI-assisted automated workflow for real-time x-ray ptychography data analysis via federated resources, Anakha V Babu, Tekin Bicer, Saugat Kandel, Tao Zhou, Daniel J. Ching, Steven Henke, Sinisa Veseli, Ryan Chard, Antonio Miceli, and Mathew Cherukara, Argonne National Laboratory (United States) [view abstract]
Ptychography is one of the lensless techniques that is widely used for nanoscale imaging at synchrotron radiation sources. Ptychography relies on collecting many diffraction patterns from the sample where adjacent diffraction patterns share overlapping regions. This type of data acquisition can lead to extremely large experimental dataset sizes. The collected diffraction patterns are then used for reconstructing (phase retrieval) real-space 2D view of the sample. Traditionally, the reconstruction process depends on iterative techniques, however these techniques can have long turnaround time due to dataset sizes, which can limit the experimental capabilities such as real-time experimental steering and low-latency monitoring. In this work, we present an end-to-end automated workflow that can handle the different stages of AI/ML-accelerated x-ray ptychography data analysis using high-end large-scale remote compute resources and an embedded GPU platform at the edge. The automated workflow discussed in this work coordinates operations overlapping data analysis pipeline stages. We further accelerate the data analysis pipeline by using a modified version of PtychoNN [1] -- an ML-based approach to solve phase retrieval problem that shows 100x speedup compared to traditional iterative methods. We evaluate our workflow system with real-world experimental workloads from 26ID beamline at Advanced Photon Source (APS) and ThetaGPU cluster at Argonne Leadership Computing Resources (ALCF).
12:30 – 2:00 PM Lunch
Monday 16 January PLENARY: Neural Operators for Solving PDEs
Session Chair: Robin Jenkin, NVIDIA Corporation (United States)
2:00 PM – 3:00 PM
Cyril Magnin I/II/III
Deep learning surrogate models have shown promise in modeling complex physical phenomena such as fluid flows, molecular dynamics, and material properties. However, standard neural networks assume finite-dimensional inputs and outputs, and hence, cannot withstand a change in resolution or discretization between training and testing. We introduce Fourier neural operators that can learn operators, which are mappings between infinite dimensional spaces. They are independent of the resolution or grid of training data and allow for zero-shot generalization to higher resolution evaluations. When applied to weather forecasting, neural operators capture fine-scale phenomena and have similar skill as gold-standard numerical weather models for predictions up to a week or longer, while being 4-5 orders of magnitude faster.
Anima Anandkumar, Bren professor, California Institute of Technology, and senior director of AI Research, NVIDIA Corporation (United States)
Anima Anandkumar is a Bren Professor at Caltech and Senior Director of AI Research at NVIDIA. She is passionate about designing principled AI algorithms and applying them to interdisciplinary domains. She has received several honors such as the IEEE fellowship, Alfred. P. Sloan Fellowship, NSF Career Award, and Faculty Fellowships from Microsoft, Google, Facebook, and Adobe. She is part of the World Economic Forum's Expert Network. Anandkumar received her BTech from Indian Institute of Technology Madras, her PhD from Cornell University, and did her postdoctoral research at MIT and assistant professorship at University of California Irvine.
3:00 – 3:30 PM Coffee Break
EI 2023 Highlights Session
Session Chair: Robin Jenkin, NVIDIA Corporation (United States)
3:30 – 5:00 PM
Cyril Magnin II
Join us for a session that celebrates the breadth of what EI has to offer with short papers selected from EI conferences.
NOTE: The EI-wide "EI 2023 Highlights" session is concurrent with Monday afternoon COIMG, COLOR, IMAGE, and IQSP conference sessions.
IQSP-309
Evaluation of image quality metrics designed for DRI tasks with automotive cameras, Valentine Klein, Yiqi LI, Claudio Greco, Laurent Chanas, and Frédéric Guichard, DXOMARK (France) [view abstract]
Driving assistance is increasingly used in new car models. Most driving assistance systems are based on automotive cameras and computer vision. Computer Vision, regardless of the underlying algorithms and technology, requires the images to have good image quality, defined according to the task. This notion of good image quality is still to be defined in the case of computer vision as it has very different criteria than human vision: humans have a better contrast detection ability than image chains. The aim of this article is to compare three different metrics designed for detection of objects with computer vision: the Contrast Detection Probability (CDP) [1, 2, 3, 4], the Contrast Signal to Noise Ratio (CSNR) [5] and the Frequency of Correct Resolution (FCR) [6]. For this purpose, the computer vision task of reading the characters on a license plate will be used as a benchmark. The objective is to check the correlation between the objective metric and the ability of a neural network to perform this task. Thus, a protocol to test these metrics and compare them to the output of the neural network has been designed and the pros and cons of each of these three metrics have been noted.
SD&A-224
Human performance using stereo 3D in a helmet mounted display and association with individual stereo acuity, Bonnie Posselt, RAF Centre of Aviation Medicine (United Kingdom) [view abstract]
Binocular Helmet Mounted Displays (HMDs) are a critical part of the aircraft system, allowing information to be presented to the aviator with stereoscopic 3D (S3D) depth, potentially enhancing situational awareness and improving performance. The utility of S3D in an HMD may be linked to an individual’s ability to perceive changes in binocular disparity (stereo acuity). Though minimum stereo acuity standards exist for most military aviators, current test methods may be unable to characterise this relationship. This presentation will investigate the effect of S3D on performance when used in a warning alert displayed in an HMD. Furthermore, any effect on performance, ocular symptoms, and cognitive workload shall be evaluated in regard to individual stereo acuity measured with a variety of paper-based and digital stereo tests.
IMAGE-281
Smartphone-enabled point-of-care blood hemoglobin testing with color accuracy-assisted spectral learning, Sang Mok Park1, Yuhyun Ji1, Semin Kwon1, Andrew R. O’Brien2, Ying Wang2, and Young L. Kim1; 1Purdue University and 2Indiana University School of Medicine (United States) [view abstract]
We develop an mHealth technology for noninvasively measuring blood Hgb levels in patients with sickle cell anemia, using the photos of peripheral tissue acquired by the built-in camera of a smartphone. As an easily accessible sensing site, the inner eyelid (i.e., palpebral conjunctiva) is used because of the relatively uniform microvasculature and the absence of skin pigments. Color correction (color reproduction) and spectral learning (spectral super-resolution spectroscopy) algorithms are integrated for accurate and precise mHealth blood Hgb testing. First, color correction using a color reference chart with multiple color patches extracts absolute color information of the inner eyelid, compensating for smartphone models, ambient light conditions, and data formats during photo acquisition. Second, spectral learning virtually transforms the smartphone camera into a hyperspectral imaging system, mathematically reconstructing high-resolution spectra from color-corrected eyelid images. Third, color correction and spectral learning algorithms are combined with a spectroscopic model for blood Hgb quantification among sickle cell patients. Importantly, single-shot photo acquisition of the inner eyelid using the color reference chart allows straightforward, real-time, and instantaneous reading of blood Hgb levels. Overall, our mHealth blood Hgb tests could potentially be scalable, robust, and sustainable in resource-limited and homecare settings.
AVM-118
Designing scenes to quantify the performance of automotive perception systems, Zhenyi Liu1, Devesh Shah2, Alireza Rahimpour2, Joyce Farrell1, and Brian Wandell1; 1Stanford University and 2Ford Motor Company (United States) [view abstract]
We implemented an end-to-end simulation for perception systems, based on cameras, that are used in automotive applications. The open-source software creates complex driving scenes and simulates cameras that acquire images of these scenes. The camera images are then used by a neural network in the perception system to identify the locations of scene objects, providing the results as input to the decision system. In this paper, we design collections of test scenes that can be used to quantify the perception system’s performance under a range of (a) environmental conditions (object distance, occlusion ratio, lighting levels), and (b) camera parameters (pixel size, lens type, color filter array). We are designing scene collections to analyze performance for detecting vehicles, traffic signs and vulnerable road users in a range of environmental conditions and for a range of camera parameters. With experience, such scene collections may serve a role similar to that of standardized test targets that are used to quantify camera image quality (e.g., acuity, color).
VDA-403
Visualizing and monitoring the process of injection molding, Christian A. Steinparz1, Thomas Mitterlehner2, Bernhard Praher2, Klaus Straka1,2, Holger Stitz1,3, and Marc Streit1,3; 1Johannes Kepler University, 2Moldsonics GmbH, and 3datavisyn GmbH (Austria) [view abstract]
In injection molding machines the molds are rarely equipped with sensor systems. The availability of non-invasive ultrasound-based in-mold sensors provides better means for guiding operators of injection molding machines throughout the production process. However, existing visualizations are mostly limited to plots of temperature and pressure over time. In this work, we present the result of a design study created in collaboration with domain experts. The resulting prototypical application uses real-world data taken from live ultrasound sensor measurements for injection molding cavities captured over multiple cycles during the injection process. Our contribution includes a definition of tasks for setting up and monitoring the machines during the process, and the corresponding web-based visual analysis tool addressing these tasks. The interface consists of a multi-view display with various levels of data aggregation that is updated live for newly streamed data of ongoing injection cycles.
COIMG-155
Commissioning the James Webb Space Telescope, Joseph M. Howard, NASA Goddard Space Flight Center (United States) [view abstract]
Astronomy is arguably in a golden age, where current and future NASA space telescopes are expected to contribute to this rapid growth in understanding of our universe. The most recent addition to our space-based telescopes dedicated to astronomy and astrophysics is the James Webb Space Telescope (JWST), which launched on 25 December 2021. This talk will discuss the first six months in space for JWST, which were spent commissioning the observatory with many deployments, alignments, and system and instrumentation checks. These engineering activities help verify the proper working of the telescope prior to commencing full science operations. For the session: Computational Imaging using Fourier Ptychography and Phase Retrieval.
HVEI-223
Critical flicker frequency (CFF) at high luminance levels, Alexandre Chapiro1, Nathan Matsuda1, Maliha Ashraf2, and Rafal Mantiuk3; 1Meta (United States), 2University of Liverpool (United Kingdom), and 3University of Cambridge (United Kingdom) [view abstract]
The critical flicker fusion (CFF) is the frequency of changes at which a temporally periodic light will begin to appear completely steady to an observer. This value is affected by several visual factors, such as the luminance of the stimulus or its location on the retina. With new high dynamic range (HDR) displays, operating at higher luminance levels, and virtual reality (VR) displays, presenting at wide fields-of-view, the effective CFF may change significantly from values expected for traditional presentation. In this work we use a prototype HDR VR display capable of luminances up to 20,000 cd/m^2 to gather a novel set of CFF measurements for never before examined levels of luminance, eccentricity, and size. Our data is useful to study the temporal behavior of the visual system at high luminance levels, as well as setting useful thresholds for display engineering.
HPCI-228
Physics guided machine learning for image-based material decomposition of tissues from simulated breast models with calcifications, Muralikrishnan Gopalakrishnan Meena1, Amir K. Ziabari1, Singanallur Venkatakrishnan1, Isaac R. Lyngaas1, Matthew R. Norman1, Balint Joo1, Thomas L. Beck1, Charles A. Bouman2, Anuj Kapadia1, and Xiao Wang1; 1Oak Ridge National Laboratory and 2Purdue University (United States) [view abstract]
Material decomposition of Computed Tomography (CT) scans using projection-based approaches, while highly accurate, poses a challenge for medical imaging researchers and clinicians due to limited or no access to projection data. We introduce a deep learning image-based material decomposition method guided by physics and requiring no access to projection data. The method is demonstrated to decompose tissues from simulated dual-energy X-ray CT scans of virtual human phantoms containing four materials - adipose, fibroglandular, calcification, and air. The method uses a hybrid unsupervised and supervised learning technique to tackle the material decomposition problem. We take advantage of the unique X-ray absorption rate of calcium compared to body tissues to perform a preliminary segmentation of calcification from the images using unsupervised learning. We then perform supervised material decomposition using a deep learned UNET model which is trained using GPUs in the high-performant systems at the Oak Ridge Leadership Computing Facility. The method is demonstrated on simulated breast models to decompose calcification, adipose, fibroglandular, and air.
3DIA-104
Layered view synthesis for general images, Loïc Dehan, Wiebe Van Ranst, and Patrick Vandewalle, Katholieke University Leuven (Belgium) [view abstract]
We describe a novel method for monocular view synthesis. The goal of our work is to create a visually pleasing set of horizontally spaced views based on a single image. This can be applied in view synthesis for virtual reality and glasses-free 3D displays. Previous methods produce realistic results on images that show a clear distinction between a foreground object and the background. We aim to create novel views in more general, crowded scenes in which there is no clear distinction. Our main contributions are a computationally efficient method for realistic occlusion inpainting and blending, especially in complex scenes. Our method can be effectively applied to any image, which is shown both qualitatively and quantitatively on a large dataset of stereo images. Our method performs natural disocclusion inpainting and maintains the shape and edge quality of foreground objects.
ISS-329
A self-powered asynchronous image sensor with independent in-pixel harvesting and sensing operations, Ruben Gomez-Merchan, Juan Antonio Leñero-Bardallo, and Ángel Rodríguez-Vázquez, University of Seville (Spain) [view abstract]
A new self-powered asynchronous sensor with a novel pixel architecture is presented. Pixels are autonomous and can harvest or sense energy independently. During the image acquisition, pixels toggle to a harvesting operation mode once they have sensed their local illumination level. With the proposed pixel architecture, most illuminated pixels provide an early contribution to power the sensor, while low illuminated ones spend more time sensing their local illumination. Thus, the equivalent frame rate is higher than the offered by conventional self-powered sensors that harvest and sense illumination in independient phases. The proposed sensor uses a Time-to-First-Spike readout that allows trading between image quality and data and bandwidth consumption. The sensor has HDR operation with a dynamic range of 80 dB. Pixel power consumption is only 70 pW. In the article, we describe the sensor’s and pixel’s architectures in detail. Experimental results are provided and discussed. Sensor specifications are benchmarked against the art.
COLOR-184
Color blindness and modern board games, Alessandro Rizzi1 and Matteo Sassi2; 1Università degli Studi di Milano and 2consultant (Italy) [view abstract]
Board game industry is experiencing a strong renewed interest. In the last few years, about 4000 new board games have been designed and distributed each year. Board game players gender balance is reaching the equality, but nowadays the male component is a slight majority. This means that (at least) around 10% of board game players are color blind. How does the board game industry deal with this ? Recently, a raising of awareness in the board game design has started but so far there is a big gap compared with (e.g.) the computer game industry. This paper presents some data about the actual situation, discussing exemplary cases of successful board games.
5:00 – 6:15 PM EI 2023 All-Conference Welcome Reception (in the Cyril Magnin Foyer)
Tuesday 17 January 2023
KEYNOTE: High-Performance Imaging (T1)
Session Chair: Xiao Wang, Oak Ridge National Laboratory (United States)
8:50 – 10:20 AM
Mason
8:50HPCI-233
KEYNOTE: Reducing the barriers to high performance imaging, Charles A. Bouman, Purdue University (United States) [view abstract]
Prof. Charles A. Bouman received a BSEE from the University of Pennsylvania in 1981 and an MS from the University of California at Berkeley in 1982. From 1982 to 1985, he was a full staff member at MIT Lincoln Laboratory and in 1989 he received a PhD in electrical engineering from Princeton University. He joined the faculty of Purdue University in 1989 where he is currently the Showalter Professor of Electrical and Computer Engineering and Biomedical Engineering. Prof. Bouman’s research is in the area of computational imaging and sensing where he is focused on the integration of signal processing, statistical modeling, physics, and computation to solve difficult sensing problems with applications in healthcare, material science, physics, chemistry, commercial and consumer imaging. His research resulted in the first commercial model-based iterative reconstruction (MBIR) system for medical X-ray computed tomography (CT), and he is co-inventor on over 50 issued patents that have been licensed and used in millions of consumer imaging products. Professor Bouman is a member of the National Academy of Inventors, a Fellow of the IEEE, a Fellow and Honorary Member of the society for Imaging Science and Technology (IS&T), a Fellow of the American Institute for Medical and Biological Engineering (AIMBE), and a Fellow of the SPIE professional society. He is the recipient of the 2021 IEEE Signal Processing Society, Claude Shannon-Harry Nyquist Technical Achievement Award, the 2014 Electronic Imaging Scientist of the Year award, and the IS&T’s Raymond C. Bowman Award; and in 2020, his paper on Plug-and-Play Priors won the SIAM Imaging Science Best Paper Prize. He was previously the Editor-in-Chief for the IEEE Transactions on Image Processing; a Distinguished Lecturer for the IEEE Signal Processing Society; and a Vice President of Technical Activities for the IEEE Signal Processing Society, during which time he led the creation of the IEEE Transactions on Computational Imaging. He has been an associate editor for the IEEE Transactions on Image Processing, the IEEE Transactions on Pattern Analysis and Machine Intelligence, and the SIAM Journal on Mathematical Imaging. He has also been a Vice President of Publications and a member of the Board of Directors for the IS&T Society, and he is the founder and Co-Chair of the long running IS&T conference on Computational Imaging.
Over the past thirty years, algorithmic advances have achieved what may have previously seemed impossible. However, there remains a vast reservoir of well-known and very effective algorithms that go unused because of enormous practical barriers. In this talk, we present a number of case studies in which we attempt to transition state-of-the-art computational imaging algorithms to practical use in scientific, industrial, and commercial applications. The three examples we will discuss are scientific CT imaging, medical CT, and digital holographic imaging. While these applications span very different requirements and computing platform solutions, the talk will discuss approaches to solving their following common problems: *Throughput and latency: advanced algorithms are often slow, and even if they are not slow, then people inevitably would like them to be implemented with the least expensive hardware. This means that algorithms must be efficient! *Co-design: The days when algorithms could be designed to minimize the number of multiplies is long gone. Algorithms must be designed or redesigned to jointly optimize algorithmic and hardware performance. *Memory reuse: The modern processor is limited by the time it takes to communicate with registers, cache, memory, and other nodes. This means that memory reuse is always a critical algorithmic design objective. *Parallelization: As the number of registers, cores, and nodes grows exponentially, keeping them all busy requires clever algorithmic design. *Ease-of-use: The fastest and best algorithmic implementation is of no value to practitioners if they don’t know how to use it or to set the parameters!
9:50HPCI-234
High-performance embedded imaging: An optics, sensor, and computing co-design approach (Invited), Yuhao Zhu, University of Rochester (United States) [view abstract]
Addressing the world's more pressing issues such as environmental sustainability and cultural heritage preservation increasingly relies on diverse visual applications running on emerging platforms such as AR/VR headsets, autonomous machines, and smart sensor nodes. In real-time and using low power, visual computing systems must generate visual data for humans to consume, immersively, or interpret visual data to provide personalized services, intelligently. In this talk, I will explain why today's computer systems and architecture are not ready for the visual computing on the horizon, and outline the road that might get us there. I will break away from the conventional vertical, cross-layer approach in computer systems, and discuss how a horizontal approach spanning different forms of information-processing computing architecture such as optics, image sensors, and human visual systems is critical to the next decade of visual computing.
10:00 AM – 7:30 PM Industry Exhibition - Tuesday (in the Cyril Magnin Foyer)
10:20 – 10:50 AM Coffee Break
X-ray Scatter and MR Imaging (T2)
Session Chair:
Venkatesh Sridhar, Lawrence Livermore National Laboratory (United States)
10:50 AM – 12:10 PM
Mason
10:50HPCI-235
A system for large-scale inverse multiple-scattering imaging on GPU supercomputers with real data (Invited), Mert Hidayetoglu; Stanford University and SLAC National Accelerator Laboratory (United States) [view abstract]
Inverse multiple-scattering imaging captures the higher-order scattered waves and incorporates them into the iterative image reconstruction. Because of its computational burden, it had been considered as impractical until fast algorithms (in 2000s) and high-performance parallel computers (in 2010s) became available for microwave and ultrasound imaging modalities. This talk presents our previous work on a numerical inverse solver for solving large-scale nonlinear imaging problem accurately (Hidayetoglu, et al., IPDPS'18). In each iteration of the solution, the system Green's function is updated with distorted-Born approximation that requires an immense number of forward solutions. Moreover, each forward solution requires solving an expensive full-wave scattering problem. The forward solutions are parallelized and accelerated with the multilevel fast multipole algorithm (MLFMA) on GPUs. The main contribution of this work is the hierarchical parallelization of the distorted-Born iterative method (DBIM) that provides scaling by improving the parallelization granularity yet minimizing communications between GPUs. Additionally, we present efficient implementation of MLFMA on multi-GPU systems by compact representation of the linear operations as sparse matrix multiplication across the levels of the multilevel three structure. Numerical results demonstrate the weak and strong scaling of the application up to 4,096 GPUs on the Blue Waters system. The significance of the results is the unprecedented scale of DBIM for accurate image reconstruction in terms of wavelengths. This talk concludes with an extension of the previous work that demonstrates wideband ultrasound data processing for calibration of the proposed imaging system.
11:20HPCI-236
Fast massively parallel physics-based algorithm for modeling multi-order scatter in CT (Invited), Venkatesh Sridhar1, Xin Liu1, and Kyle Champley2; 1Lawrence Livermore National Laboratory and 2Ziteo Medical (United States) [view abstract]
Modeling scatter in Computed Tomography (CT) scans can significantly improve the quantitative accuracy of 3D volumetric reconstructions, especially in the case of imaging highly metallic objects. However, there exists a trade-off between the computational cost and accuracy of various physics-based scatter correction methods. For practical CT applications with large datasets, scatter-correction methods that achieve optimal speed and accuracy are desirable.
11:50HPCI-237
Clinical validation of rapid GPU-enabled DTI tractography of the brain, Felix Liu1, Vanitha Sankaranarayanan1, Javier Villanueva-Meyer1, Shawn Hervey-Jumper1, James Hawkins1, Pablo Damasceno2, Mauro Bisson3, Josh Romero3, Thorsten Kurth3, Massimiliano Fatica3, Eleftherios Garyfallidis4, Ariel Rokem5, Jason Crane1, and Sharmila Majumdar1; 1University of California, San Francisco, 2Janssen Pharmaceutical, 3NVIDIA Corporation, 4Indiana University, and 5University of Washington (United States) [view abstract]
Diffusion tensor imaging (DTI) is a non-invasive magnetic resonance imaging (MRI) modality used to map white matter fiber tracts for a variety of clinical applications; one of which is aiding preoperative assessments for tumor patients. DTI requires numerical computations on multiple diffusion weighted images to calculate diffusion tensors at each voxel and probabilistic tracking<sup>1</sup> to construct fiber tracts, or tractography. Greater accuracy in tractography is possible with larger, more advanced imaging and reconstruction algorithms. However, larger scans and advanced reconstruction is often computationally intensive. The post-processing pipeline involves significant computational resources and time and requires up to 40 minutes of computation time on state-of-the-art hardware. Parallel GPU computations can improve time for the resource-intensive tractography. A collaborative team from DIPY, NVIDIA, and UCSF recently developed a tool, GPUStreamlines, for GPU-enabled tractography<sup>2</sup> which has been expanded to support the constant solid angle (CSA) reconstruction algorithm<sup>3</sup>. This GPU-enabled tractography was applied to MRIs of brains with and without presence of lesions, with substantial increases in processing speed. We demonstrate that CSA GPU-enabled tractography in normal controls and patients are comparable to the existing gold standard tractography currently in place at UCSF.
12:30 – 2:00 PM Lunch
Tuesday 17 January PLENARY: Embedded Gain Maps for Adaptive Display of High Dynamic Range Images
Session Chair: Robin Jenkin, NVIDIA Corporation (United States)
2:00 PM – 3:00 PM
Cyril Magnin I/II/III
Images optimized for High Dynamic Range (HDR) displays have brighter highlights and more detailed shadows, resulting in an increased sense of realism and greater impact. However, a major issue with HDR content is the lack of consistency in appearance across different devices and viewing environments. There are several reasons, including varying capabilities of HDR displays and the different tone mapping methods implemented across software and platforms. Consequently, HDR content authors can neither control nor predict how their images will appear in other apps.
We present a flexible system that provides consistent and adaptive display of HDR images. Conceptually, the method combines both SDR and HDR renditions within a single image and interpolates between the two dynamically at display time. We compute a Gain Map that represents the difference between the two renditions. In the file, we store a Base rendition (either SDR or HDR), the Gain Map, and some associated metadata. At display time, we combine the Base image with a scaled version of the Gain Map, where the scale factor depends on the image metadata, the HDR capacity of the display, and the viewing environment.
Eric Chan, Fellow, Adobe Inc. (United States)
Eric Chan is a Fellow at Adobe, where he develops software for editing photographs. Current projects include Photoshop, Lightroom, Camera Raw, and Digital Negative (DNG). When not writing software, Chan enjoys spending time at his other keyboard, the piano. He is an enthusiastic nature photographer and often combines his photo activities with travel and hiking.
Paul M. Hubel, director of Image Quality in Software Engineering, Apple Inc. (United States)
Paul M. Hubel is director of Image Quality in Software Engineering at Apple. He has worked on computational photography and image quality of photographic systems for many years on all aspects of the imaging chain, particularly for iPhone. He trained in optical engineering at University of Rochester, Oxford University, and MIT, and has more than 50 patents on color imaging and camera technology. Hubel is active on the ISO-TC42 committee Digital Photography, where this work is under discussion, and is currently a VP on the IS&T Board. Outside work he enjoys photography, travel, cycling, coffee roasting, and plays trumpet in several bay area ensembles.
3:00 – 3:30 PM Coffee Break
High Performance Tomographic Reconstruction (T3)
Session Chair:
Peng Chen, Japan National Lab (AIST) (Japan)
3:30 – 5:30 PM
Mason
3:30HPCI-238
Training end-to-end unrolled iterative neural networks for SPECT image reconstruction: A fast and memory efficient Julia toolbox (Invited), Zongyu Li, Yuni K. Dewaraja, and Jeff Fessler, University of Michigan (United States) [view abstract]
Single-Photon Emission Computed Tomography (SPECT) plays a pivotal role in clinical diagnosis, tumor detection and treatment, as well as estimating absorbed dosimetry. For example, quantitative SPECT imaging with Lutetium-177 (177Lu) in targeted radionuclide therapy (such as 177Lu-PSMA) is important in determining doseresponse relationships in tumors and holds great potential for dosimetry-based individualized treatment. However, its image reconstruction problem is challenging due to the poor resolution of SPECT camera that results in an very ill-conditioned system matrix. Traditional model-based reconstruction algorithms such as Expectation-Maximization (EM) and its variant Ordered-Subset Expectation-Maximization (OSEM) suffer from a trade-off between recovery and overfitting to noise. Incorporating regularizers may partially address that trade-off, but traditional handcrafted regularizers such as total variation (TV), orthogonal discrete wavelet transform (ODWT), and Block-matching and 3D filtering (BM3D) assume the latent images follow some desired properties such as piece-wise uniformity, and hence can lack generalizability to images that may not follow these properties. With the recent success of deep learning (DL) methods in medical imaging, there has been several DL-regularized SPECT reconstruction methods. Although promising results were reported in previous works, the networks embedded in the iterative optimization algorithms of these papers were not trained in end-to-end fashion and hence could lead to sub-optimal results. The most important reason has been the lack of backpropagatable SPECT forward-backward projector. In this work, we provide a Julia implementation of backpropagatable SPECT forward-backward projector that models parallel-beam collimators and accounts for attenuation and depth-dependent collimator response.
4:00HPCI-239
Fast GPU-based tomographic reconstruction with efficient data transfers between CPU, GPU, and NVMe SSDs (Invited), Viktor Nikitin, Argonne National Laboratory (United States) [view abstract]
We present a new package, called Tomocupy, for efficient GPU-based tomographic reconstruction with conveyor data processing in 16 and 32-bit floating-point precision. The proposed 3D reconstruction conveyor includes read-write operations with NVMe SSD disks, CPU-GPU data transfers, and GPU computations. All the operations are timely overlapped using CUDA Streams and Python multithreading and result in 2-3x performance again compared to their sequential execution. Memory usage and processing speed are also optimized by switching to half-precision arithmetic since most cameras used for X-ray imaging work with less than 12-bit analog to digital converters, not requiring 32-bit precision arithmetic. As a result, the new reconstruction with the conveyor data processing and efficient GPU implementation became more than 15x times faster than its multithreaded CPU equivalent (Tomopy package). Full reconstruction (including read-write operations) of a 2048<sup>3</sup> tomographic volume takes less than 7 s on one Nvidia Tesla A100 and PCIe 4.0 NVMe SSD, and scales almost linearly increasing the data size. The code is also suitable for the multi-GPU and RAID multi-SSD setup to process large datasets, and runs efficiently on a supercomputer of Argonne Leadership Computing Facility (ALCF).
4:30HPCI-240
Hierarchical communications for 3D image reconstruction with synchrotron light source and 24,576 GPUs (Invited), Mert Hidayetoglu; Stanford University and SLAC National Accelerator Laboratory (United States) [view abstract]
X-ray computed tomography (XCT) is a commonly used technique for noninvasive imaging at synchrotron facilities. Iterative tomographic reconstruction algorithms are often preferred for recovering high quality 3D volumetric images from 2D X-ray images; however, their use has been limited to small/medium datasets due to their computational requirements. In this work, we propose a high-performance iterative reconstruction system for terabyte(s)-scale 3D volumes (Hidayetoglu, et. al, SC'20). Our design involves (1) optimization of (back)projection operators by a memory-centric approach, (2) performing hierarchical communications by exploiting "fat-node" architecture with many GPUs; (3) utilization of mixed-precision types while preserving convergence rate and quality. We extensively evaluate the proposed optimizations and scaling on the Summit supercomputer. Our largest reconstruction is a mouse brain volume with 9Kx11Kx11K voxels, where the total reconstruction time is under three minutes using 24,576 GPUs, reaching 65 PFLOPS: 34% of Summit's peak performance.
5:00HPCI-241
High-performance image reconstruction on GPU-accelerated supercomputers (Invited), Peng Chen1, Mohamed Wahib2, Xiao Wang3, Jintao Meng4, and Yusuke Tanimura1; 1Japan National Lab (AIST) (Japan), 2RIKEN Center for Computational Science (Japan), 3Oak Ridge National Laboratory (United States), and 4Shenzhen Institute of Advanced Technology, CAS (China) [view abstract]
Tomographic imaging is a widely used technology that requires intensive computations. We will introduce our activity on high-performance image reconstruction using GPU-accelerated supercomputers. We take advantage of the heterogeneity of GPU-accelerated systems by overlapping the filtering computation and back-projection on CPUs and GPUs, respectively. We propose a novel decomposition scheme and reconstruction algorithm for distributed image reconstruction. This scheme enables arbitrarily large input/output sizes, eliminates the redundancy arising in the end-to-end pipeline and improves the scalability by replacing two communication collectives with only one segmented reduction. More specifically, we propose a distributed framework for high-resolution image reconstruction on state-of-the-art GPU-accelerated supercomputers and implement the proposed decomposition scheme in a framework that is useful for all current-generation tomographic imaging devices.
5:30 – 7:00 PM EI 2023 Symposium Demonstration Session (in the Cyril Magnin Foyer)
Wednesday 18 January 2023
PANEL: High-Performance Computing in Imaging: from Academia to Industry (W1)
Panelists: Yuankai Huo, Vanderbilt University (United States); Yucheng Tang, NVIDIA Corporation (United States); and Xiao Wang, Oak Ridge National Laboratory (United States)
9:10 – 10:10 AM
Mason
10:00 AM – 3:30 PM Industry Exhibition - Wednesday (in the Cyril Magnin Foyer)
10:20 – 10:50 AM Coffee Break
AI Acceleration & System Design (W2)
Session Chair:
Mohamed Wahib, RIKEN Center for Computational Science (Japan)
10:50 AM – 12:30 PM
Mason
10:50HPCI-242
TVM enabled automatic kernel generation for irregular GEMM optimization on ARM architectures (Invited), Du Wu1,2, Chen Zhuang2, Haidong Lan3, Wenxi Zhu3, Minwen Deng3, Peng Chen4, Mohamed Wahib5, Jintao Meng2, Bingqiang Wang6, Yanjie Wei2, Yi Pan2, and Shengzhong Feng7; 1Southern University of Science and Technology of China (China), 2Shenzhen Institue of Advanced Technology, Chinese Academy of Science (China), 3Tencent AI Lab, Shenzhen (China), 4Japan National Lab (AIST) (Japan), 5RIKEN Center for Computational Science (Japan), 6Peng Cheng Laboratory, Shenzhen (China), and 7National Supercomputer Center in Shenzhen, China (China) [view abstract]
With the popularity of deep learning and widely used multiplication of irregular matrices, the challenge on improve the efficiency and portability of General matrix multiplication (GEMM) on irregular matrices is critical for the application of deep learning. We proposed a TVM enabled automatic kernel generation method to improve the performance on irregular GEMM across different configuration of ARM SoC chips. Our work 1) introduced TVM to find the best cache block strategy for the target architecture by building autoTVM template to automatically search the length and order of the outer-loops of matrix multiplication; 2) designed a register bloack splitting algorithm (RBSA) calling a home-grown code generation method for small matrix multiplication; 3) the experimental on three market dominating ARM SoC chips results shows that, on small matrices our work achieved 1.5 to 2 times speedup compared to LIBXSMM and LibSharom, and can maintain with a relatively higher efficiency of 98% and on with a relatively higher efficiency of 98%; on irregular matrices our work is 1.2 to 2.0 times faster than OpenBLAS and Eigen, and also confirms a slightly higher performance of 3% compared with LibSharom.
11:20HPCI-243
Towards real-time formula driven dataset feed for large scale deep learning training, Edgar Josafat Martinez Noriega1 and Rio Yokota2; 1National Institute of Advanced Industrial Science and Technology and 2Tokyo Institute of Technology (Japan) [view abstract]
During the past few years, a new architecture model for deep learning has become the new de-facto standard to tackle classification tasks that dethrone convolutional neuronal network architectures or CNNs, the Vision Transformer. However, these new models require a significant amount of data (ranging on more than 100 million images) to achieve state-of-the-art performance using transfer learning. Thus, large datasets such as JFT-300M or 3B are needed to train these transformer architectures. Nonetheless, these large datasets are proprietary, and they are not open to the public. To alleviate this and other issues such as privacy, Formula-Driven Supervised Learning (FDSL) has been proposed. This new approach utilizes synthetic images generated from mathematical formulas. Fractals and Radial Contour images are some examples of these synthetic generated patterns. We proposed to render on real-time Formula-Driven images. The main objective is to reduce the I/O bottleneck when training on larger datasets. Our approach generates all the instances on the fly meanwhile training. We implemented our custom loader using EGL to use GPU for faster rendering through shaders. Preliminary results using our custom data loader in the FractalDB-1k dataset (1 million images) shows 26% faster loading times compared to PyTorch Vision loader.
11:40HPCI-244
Algorithmic enhancements to data colocation grid frameworks for big data medical image processing (Invited), Shunxing Bao1 and Yuankai Huo2; 1Vanderbilt University and 2presenter only (United States) [view abstract]
Large-scale medical imaging studies to date have predominantly leveraged in-house, laboratory-based, or traditional grid computing resources for their computing needs, where the applications often use hierarchical data structures or databases for storage and retrieval. The resulting performance for laboratory-based approaches reveals that performance is impeded by standard network switches since typical processing can saturate network bandwidth during the transfer from storage to processing nodes for even moderate-sized studies. On the other hand, the grid may be costly to use due to the dedicated resources used to execute the tasks and the lack of elasticity. With the increasing availability of cloud-based big data frameworks, such as Apache Hadoop, cloud-based services for executing medical imaging studies have shown promise. This talk introduces a framework - Hadoop & HBase for Medical Image Processing (HadoopBase-MIP) - which develops a range of performance optimization algorithms and employs many system behaviors modeling for data storage, data access, and data processing that solves empirical system challenges for medical image analysis on a real cluster.
12:10HPCI-245
Bridging the gap between high-performance computing and high-performance imaging applications, Mohamed Wahib, RIKEN Center for Computational Science (Japan) [view abstract]
Scientific and engineering imaging applications, e.g. computed tomography, have been dramatically evolving in complexity in the last two decades. This in-part is fueled by the exponential growth in performance described by Moore's law. In recent years, the High-Performance Computing (HPC) community has been actively engaging in providing scalable solutions to keep up with the increasing computational requirements of imaging applications and algorithms. High-performance imaging is starting to encounter scalability bottlenecks that are well-known in traditional HPC scientific applications (ex: climate simulations). In this talk we give the HPC perspective of high-performance imaging though an overview of recent success stories. We also highlight the capability of HPC systems for engaging with large scale computational problems and invite the high-performance imaging community to further push into the territory of HPC.
12:30 – 2:00 PM Lunch
Wednesday 18 January PLENARY: Bringing Vision Science to Electronic Imaging: The Pyramid of Visibility
Session Chair: Andreas Savakis, Rochester Institute of Technology (United States)
2:00 PM – 3:00 PM
Cyril Magnin I/II/III
Electronic imaging depends fundamentally on the capabilities and limitations of human vision. The challenge for the vision scientist is to describe these limitations to the engineer in a comprehensive, computable, and elegant formulation. Primary among these limitations are visibility of variations in light intensity over space and time, of variations in color over space and time, and of all of these patterns with position in the visual field. Lastly, we must describe how all these sensitivities vary with adapting light level. We have recently developed a structural description of human visual sensitivity that we call the Pyramid of Visibility, that accomplishes this synthesis. This talk shows how this structure accommodates all the dimensions described above, and how it can be used to solve a wide variety of problems in display engineering.
Andrew B. Watson, chief vision scientist, Apple Inc. (United States)
Andrew Watson is Chief Vision Scientist at Apple, where he leads the application of vision science to technologies, applications, and displays. His research focuses on computational models of early vision. He is the author of more than 100 scientific papers and 8 patents. He has 21,180 citations and an h-index of 63. Watson founded the Journal of Vision, and served as editor-in-chief 2001-2013 and 2018-2022. Watson has received numerous awards including the Presidential Rank Award from the President of the United States.
3:00 – 3:30 PM Coffee Break
5:30 – 7:00 PM EI 2023 Symposium Interactive (Poster) Paper Session (in the Cyril Magnin Foyer)
5:30 – 7:00 PM EI 2023 Meet the Future: A Showcase of Student and Young Professionals Research (in the Cyril Magnin Foyer)