All published analyses and posts
This page keeps the latest public ISOM URLs linked in server-rendered HTML so new analyses can be discovered, crawled, and revisited more reliably.
Paper Analyses
Each card links the latest public language URLs for one published analysis family.
DentEval: Fine-tuning-Free Expert-Aligned Assessment in Dental Education via LLM Agents
Large language models (LLMs) have demonstrated considerable potential in automating assignment scoring within higher education, providing efficient and consistent evaluations. However, existing systems encounter substantial challenges when assessing students’ responses to open-ended short-answer questions. These challenges include the need for large, annotated datasets for fine-tuning or additional training, as well as inconsistencies between model outputs and human-level evaluations. This issue is particularly pronounced in domains requiring specialized knowledge, such as dentistry. To address these limitations, we propose DentEval, an LLM-based automated assignment assessment system supporting multimodal inputs (e.g., text and clinical images) that is tailored for dental curricula. This framework integrates role-playing prompting and Self-refining Retrieval-Augmented Generation (SR-RAG) to assess student responses and ensure that the system’s outputs closely align with human grading standards. We further utilized a dataset annotated by dental professors, dividing it into few-shot learning and testing sets to evaluate the DentEval framework. Results demonstrate that DentEval exhibits a stronger correlation with human grading compared to representative baselines. Finally, comprehensive ablation studies validate the effectiveness of the individual components incorporated in DentEval. Our code is available on GitHub at: https://github.com/DXY0711/DentEval
Oscillatory behavior for higher-order nonlinear differential equations in the canonical case
In this paper, we study the oscillation of a class of higher-order neutral nonlinear differential equations. We begin by establishing the relationship between the solution and its corresponding function. Then, by applying Riccati substitution along with some inequalities, we derive a set of sufficient conditions under which positive solutions do not exist, thus providing an oscillation criterion that guarantees the oscillation of all solutions to the equation under study. These criteria also improve upon related results in the literature. Examples are given to illustrate the significance of the established results.
Improving Quantum Machine Learning via Heat-Bath Algorithmic Cooling
This work introduces an approach rooted in quantum thermodynamics to enhance sampling efficiency in quantum machine learning (QML). We propose conceptualizing quantum supervised learning as a thermodynamic cooling process. Building on this concept, we develop a quantum refrigerator protocol that enhances sample efficiency during training and prediction without the need for Grover iterations or quantum phase estimation. Inspired by heat-bath algorithmic cooling protocols, our method alternates entropy compression and thermalization steps to decrease the entropy of qubits, increasing polarization toward the dominant bias. This technique minimizes the computational overhead associated with estimating classification scores and gradients, presenting a practical and efficient solution for QML algorithms compatible with noisy intermediate-scale quantum devices.
VesselSDF: Distance Field Priors for Vascular Network Reconstruction
Accurate segmentation of vascular networks from sparse CT scan slices remains a significant challenge in medical imaging, particularly due to the thin, branching nature of vessels and the inherent sparsity between imaging planes. Existing deep learning approaches, based on binary voxel classification, often struggle with structural continuity and geometric fidelity. To address this challenge, we present VesselSDF, a novel framework that leverages signed distance fields (SDFs) for robust vessel reconstruction. Our method reformulates vessel segmentation as a continuous SDF regression problem, where each point in the volume is represented by its signed distance to the nearest vessel surface. This continuous representation inherently captures the smooth, tubular geometry of blood vessels and their branching patterns. We obtain accurate vessel reconstructions while eliminating common SDF artifacts such as floating segments thanks to our adaptive Gaussian regularizer which ensures smoothness in regions far from vessel surfaces while producing precise geometry near the surface boundaries. Our experimental results demonstrate that VesselSDF significantly outperforms existing methods and preserves vessel geometry and connectivity, enabling more reliable vascular analysis in clinical settings.
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning
Despite the recent successes of multi-agent reinforcement learning (MARL) algorithms, efficiently adapting to co-players in mixed-motive environments remains a significant challenge. One feasible approach is to hierarchically model co-players’ behavior based on inferring their characteristics. However, these methods often encounter difficulties in efficient reasoning and utilization of inferred information. To address these issues, we propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm that enables few-shot adaptation to unseen policies in mixed-motive environments. HOP is hierarchically composed of two modules: an opponent modeling module that infers others’ goals and learns corresponding goal-conditioned policies, and a planning module that employs Monte Carlo Tree Search (MCTS) to identify the best response. Our approach improves efficiency by updating beliefs about others’ goals both across and within episodes and by using information from the opponent modeling module to guide planning. Experimental results demonstrate that in mixed-motive environments, HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios. Furthermore, the emergence of social intelligence during our experiments underscores the potential of our approach in complex multi-agent environments.
Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation
Scalp disorders are highly prevalent worldwide, yet remain underdiagnosed due to limited access to expert evaluation and the high cost of annotation. Although AI-based approaches hold great promise, their practical deployment is hindered by challenges such as severe data imbalance and the absence of pixel-level segmentation labels. To address these issues, we propose ``ScalpVision’’, an AI-driven system for the holistic diagnosis of scalp diseases. In ScalpVision, effective hair segmentation is achieved using pseudo image-label pairs and an innovative prompting method in the absence of traditional hair masking labels. Additionally, ScalpVision introduces DiffuseIT-M, a generative model adopted for dataset augmentation while maintaining hair information, facilitating improved predictions of scalp disease severity. Our experimental results affirm ScalpVision’s efficiency in diagnosing a variety of scalp conditions, showcasing its potential as a valuable tool in dermatological care. Our code is available at https://github.com/winston1214/ScalpVision.
FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation
Language promptable X-ray image segmentation would enable greater flexibility for human-in-the-loop workflows in diagnostic and interventional precision medicine. Prior efforts have contributed task-specific models capable of solving problems within a narrow scope, but expanding to broader use requires additional data, annotations, and training time. Recently, language-aligned foundation models (LFMs) – machine learning models trained on large amounts of highly variable image and text data thus enabling broad applicability – have emerged as promising tools for automated image analysis. Existing foundation models for medical image analysis focus on scenarios and modalities where large, richly annotated datasets are available. However, the X-ray imaging modality features highly variable image appearance and applications, from diagnostic chest X-rays to interventional fluoroscopy, with varying availability of data. To pave the way toward an LFM for comprehensive and language-aligned analysis of arbitrary medical X-ray images, we introduce FluoroSAM, a language-promptable variant of the Segment-Anything Model, trained from scratch on 3M synthetic X-ray images from a wide variety of human anatomies, imaging geometries, and viewing angles. These include pseudo-ground truth masks for 128 organ types and 464 tools with associated text descriptions. FluoroSAM is capable of segmenting myriad anatomical structures and tools based on natural language prompts, thanks to the novel incorporation of vector quantization (VQ) of text embeddings in the training process. We demonstrate FluoroSAM’s performance quantitatively on real X-ray images and showcase on several applications how FluoroSAM is a key enabler for rich human-machine interaction in the X-ray image acquisition and analysis context. Information on data, weights, and code is available at https://github.com/arcadelab/fluorosam.
PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging
Lensless cameras offer significant advantages in size, weight, and cost compared to traditional lens-based systems. Without a focusing lens, lensless cameras rely on computational algorithms to recover the scenes from multiplexed measurements. However, current algorithms struggle with inaccurate forward imaging models and insufficient priors to reconstruct high-quality images. To overcome these limitations, we introduce a novel two-stage approach for consistent and photorealistic lensless image reconstruction. The first stage of our approach ensures data consistency by focusing on accurately reconstructing the low-frequency content with a spatially varying deconvolution method that adjusts to changes in the Point Spread Function (PSF) across the camera's field of view. The second stage enhances photorealism by incorporating a generative prior from pre-trained diffusion models. By conditioning on the low-frequency content retrieved in the first stage, the diffusion model effectively reconstructs the high-frequency details that are typically lost in the lensless imaging process, while also maintaining image fidelity. Our method achieves a superior balance between data fidelity and visual quality compared to existing methods, as demonstrated with two popular lensless systems, PhlatCam and DiffuserCam.
Dynamical arrest in active nematic turbulence
Active fluids display spontaneous turbulentlike flows known as active turbulence. Recent work revealed that these flows have universal features, independent of the material properties and of the presence of topological defects. However, the differences between defect-laden and defect-free active turbulence remain largely unexplored. Here, by means of large-scale numerical simulations, we show that defect-free active nematic turbulence can undergo dynamical arrest. This state is characterized by an emergent network of nematic domain walls that channels coherent streams and suppresses chaotic flows. As the system evolves, the branched wall network produces a large-scale pattern with treelike topological properties. We find that flow alignment—the tendency of nematics to reorient under shear—enhances large-scale chaotic jets in contractile rodlike systems while promoting dynamical arrest in extensile systems. We further show that dynamical arrest arises regardless of whether defects are prohibited by construction or simply fail to form due to a high energy cost of defect cores. Taken together, our findings reveal a striking pattern-formation mechanism, with labyrinths emerging from active turbulence, and illuminate the rich transitional regime between defect-free and defect-laden dynamics. These behaviors call for the experimental realization of active nematics at vanishing or low defect densities, and underscore that, in extensile rodlike nematics, topological defects enable turbulence by preventing dynamical arrest.
Adversarial Attacks on Combinatorial Multi-Armed Bandits
We study reward poisoning attacks on Combinatorial Multi-armed Bandits (CMAB). We first provide a sufficient and necessary condition for the attackability of CMAB, a notion to capture the vulnerability and robustness of CMAB. The attackability condition depends on the intrinsic properties of the corresponding CMAB instance such as the reward distributions of super arms and outcome distributions of base arms. Additionally, we devise an attack algorithm for attackable CMAB instances. Contrary to prior understanding of multi-armed bandits, our work reveals a surprising fact that the attackability of a specific CMAB instance also depends on whether the bandit instance is known or unknown to the adversary. This finding indicates that adversarial attacks on CMAB are difficult in practice and a general attack strategy for any CMAB instance does not exist since the environment is mostly unknown to the adversary. We validate our theoretical findings via extensive experiments on real-world CMAB applications including probabilistic maximum covering problem, online minimum spanning tree, cascading bandits for online ranking, and online shortest path.
A Walsh Hadamard Derived Linear Vector Symbolic Architecture
Vector Symbolic Architectures (VSAs) are one approach to developing Neuro-symbolic AI, where two vectors in $\mathbb{R}^d$ are 'bound' together to produce a new vector in the same space. VSAs support the commutativity and associativity of this binding operation, along with an inverse operation, allowing one to construct symbolic-style manipulations over real-valued vectors. Most VSAs were developed before deep learning and automatic differentiation became popular and instead focused on efficacy in hand-designed systems. In this work, we introduce the Hadamard-derived linear Binding (HLB), which is designed to have favorable computational efficiency, and efficacy in classic VSA tasks, and perform well in differentiable systems.
How Spurious Features are Memorized: Precise Analysis for Random and NTK Features
Deep learning models are known to overfit and memorize spurious features in the training dataset. While numerous empirical studies have aimed at understanding this phenomenon, a rigorous theoretical framework to quantify it is still missing. In this paper, we consider spurious features that are uncorrelated with the learning task, and we provide a precise characterization of how they are memorized via two separate terms: (i) the stability of the model with respect to individual training samples, and (ii) the feature alignment between the spurious pattern and the full sample. While the first term is well established in learning theory and it is connected to the generalization error in classical work, the second one is, to the best of our knowledge, novel. Our key technical result gives a precise characterization of the feature alignment for the two prototypical settings of random features (RF) and neural tangent kernel (NTK) regression. We prove that the memorization of spurious features weakens as the generalization capability increases and, through the analysis of the feature alignment, we unveil the role of the model and of its activation function. Numerical experiments show the predictive power of our theory on standard datasets (MNIST, CIFAR-10).
Beyond Shadows: Learning Physics-inspired Ultrasound Confidence Maps from Sparse Annotations
This paper introduces a novel user-centered approach for generating confidence maps in ultrasound imaging. Existing methods, relying on simplified models, often fail to account for the full range of ultrasound artifacts and are limited by arbitrary boundary conditions, making frame-to-frame comparisons challenging. Our approach integrates sparse binary annotations into a physics-inspired probabilistic graphical model that can estimate the likelihood of confidence maps. We propose to train convolutional neural networks to predict the most likely confidence map. This results in an approach that is fast, capable of dealing with various artifacts, temporally stable, and allows users to directly influence the algorithm’s behavior using annotations. We demonstrate our method’s ability to cope with a variety of challenging artifacts and evaluate it quantitatively on two downstream tasks, bone shadow segmentation and multi-modal image registration, with superior performance than the state-of-art. We make our training code public.
Vector-Quantization-Driven Active Learning for Efficient Multi-Modal Medical Segmentation with Cross-Modal Assistance
Multi-modal medical image segmentation leverages complementary information across different modalities to enhance diagnostic accuracy, but faces two critical challenges: the requirement for extensive paired annotations and the difficulty in capturing complex inter-modality relationships. While Active Learning (AL) can reduce annotation burden through strategic sample selection, conventional methods suffer from unreliable uncertainty quantification. Meanwhile, Vector Quantization (VQ) offers a mechanism for encoding inter-modality relationships, yet existing implementations struggle with codebook misalignment across modalities. To address these limitations, we propose a novel Vector Quantization - Bimodal Entropy-Guided Active Learning (VQ-BEGAL) framework that employs a dual-encoder architecture with VQ to discretize continuous features into distinct codewords, effectively preserving modality-specific information while mitigating feature co-linearity. Unlike conventional AL methods that separate sample selection from model training, our approach integrates feature-level uncertainty estimation from cross-modal discriminator outputs into the training process—strategically allocating samples with different uncertainty characteristics to optimize specific network components, enhancing both feature extraction stability and decoder robustness.Experiments on benchmark datasets demonstrate that our approach achieves state-of-the-art performance while requiring significantly fewer annotations, making it particularly valuable for real-world clinical applications where labeled data is scarce. The code is available at https://github.com/xf-DU/vq-begal.
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion
Generative Video Propagation
INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning
All-in-one medical image-to-image translation
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
RedDino: A foundation model for red blood cell analysis
Red blood cells (RBCs) are fundamental to human health, and precise morphological analysis is critical for diagnosing hematological disorders. Despite the potential of foundation models for medical diagnostics, comprehensive AI solutions for RBC analysis remain limited. We introduce RedDino, a self-supervised foundation model specifically designed for RBC image analysis. Leveraging a RBC-tailored version of the DINOv2 self-supervised learning framework, RedDino is trained on an extensive, meticulously curated dataset comprising 1.25 million RBC images from diverse acquisition modalities and sources. Comprehensive evaluations demonstrate that RedDino significantly outperforms existing state-of-the-art models on the RBC shape classification task. Through systematic assessments, including linear probing and nearest neighbor classification, we validate the model’s robust feature representation and strong generalization capabilities. Our key contributions are (1) a dedicated foundation model tailored for RBC analysis, (2) detailed ablation studies exploring DINOv2 configurations for RBC modeling, and (3) comprehensive generalization performance evaluation. We address key challenges in computational hematology by developing RedDino, a robust and generalizable model that captures nuanced morphological characteristics and represents a substantial advancement in developing reliable diagnostic tools. The source code and pretrained models for RedDino are available at https://anonymous.4open.science/r/RedDino-1F17 .
Patient-specific radiomic feature selection with reconstructed healthy persona of knee MR images
Classical radiomic features (e.g., entropy, energy) have been designed to describe image appearance and intensity patterns. These features are directly interpretable and readily understood by radiologists. Compared with end-to-end deep learning (DL) models, lower dimensional parametric models that use such radiomic features offer enhanced interpretability but lower comparative performance in clinical tasks. In this study, we propose an approach where a standard logistic regression model performance is substantially improved by learning to select radiomic features for individual patients, from a pool of candidate features. This approach has potentials to maintain the interpretability of such approaches while offering comparable performance to DL. In addition, we also propose to expand the feature pool by generating a patient-specific healthy persona via mask-inpainting using a denoising diffusion model trained on healthy subjects. Such a pathology-free baseline feature set allows not only further opportunity in novel feature discovery but also improved condition classification. We demonstrate our method on multiple clinical tasks of classifying general abnormalities, anterior cruciate ligament tears, and meniscus tears. Experimental results demonstrate that our approach achieved comparable or even superior performance than state-of-the-art DL approaches while offering added interpretability through the use of radiomic features extracted from images and supplemented by generating healthy personas. Example clinical cases are discussed in-depth to demonstrate the interpretability-enabled utilities such as human-explainable feature discovery and patient-specific location/view selection. These findings highlight the potentials of the combination of subject-specific feature selection with generative models in augmenting radiomic analysis for more interpretable decision-making. The codes are available at: https://github.com/YaxiiC/RadiomicsPersona.git
MeDi: Metadata-Guided Diffusion Models for Mitigating Biases in Tumor Classification
Deep learning models have made significant advances in histological prediction tasks in recent years. However, for adaptation in clinical practice, their lack of robustness to varying conditions such as staining, scanner, hospital, and demographics is still a limiting factor: if trained on overrepresented subpopulations, models regularly struggle with less frequent patterns, leading to shortcut learning and biased predictions. Large-scale foundation models have not fully eliminated this issue. Therefore, we propose a novel approach explicitly modeling such metadata into a Metadata-guided generative Diffusion model framework (MeDi). MeDi allows for a targeted augmentation of underrepresented subpopulations with synthetic data, which balances limited training data and mitigates biases in downstream models. We experimentally show that MeDi generates high-quality histopathology images for unseen subpopulations in TCGA, boosts the overall fidelity of the generated images, and enables improvements in performance for downstream classifiers on datasets with subpopulation shifts. Our work is a proof-of-concept towards better mitigating data biases with generative models.
Multi-Level Gated U-Net for Denoising TMR Sensor-Based MCG Signals
Tunnel magnetoresistance (TMR) sensors have been recognized as a cost-effective alternative for measuring magnetocardiography (MCG) signals. However, their relatively high noise levels and susceptibility to contamination limit their practical clinical applications. To address these challenges, we propose a novel Multi-Level Gated U-Net (MGU-Net) model specifically designed for denoising long sequential MCG signals obtained from TMR sensors. The MGU-Net leverages the U-Net architecture to learn hierarchical representations, integrated with a novel Gated Linear Unit (GLU) module to capture the periodic pattern of Q, R, and S wave complex (QRS complex) from MCG. This design enhances periodic cardiac signatures and suppresses irregular noise components through adaptive gating mechanisms. We have developed a TMR-based MCG system and collected both simulated and real MCG data in a magnetically shielded environment. The results show that our method improve signal-to-noise ratio (SNR) from -2.142 dB to 10.505 dB on the simulated MCG dataset and from 3.958 dB to 14.514 dB on the real dataset, surpassing other state-of-the-art methods. Our model successfully recovers subtle P-wave and T-wave features from the noisy signals, illustrating a promising direction of using TMR-based systems for potential practical clinical applications.
Hierarchical Part-based Generative Model for Realistic 3D Blood Vessel
Advancements in 3D vision have increased the impact of blood vessel modeling on medical applications. However, accurately representing the complex geometry and topology of blood vessels remains a challenge due to their intricate branching patterns, curvatures, and irregular shapes. In this study, we propose a hierarchical part-based framework for 3D vessel generation that separates the global binary tree-like topology from local geometric details. Our approach proceeds in three stages: (1) key graph generation to model the overall hierarchical structure, (2) vessel segment generation conditioned on geometric properties, and (3) hierarchical vessel assembly by integrating the local segments according to the global key graph. We validate our framework on real-world datasets, demonstrating superior performance over existing methods in modeling complex vascular networks. This work marks the first successful application of a part-based generative approach for 3D vessel modeling, setting a new benchmark for vascular data generation. The code is available at: https://github.com/CybercatChen/PartVessel.git.
Prompt-DAS: Annotation-Efficient Prompt Learning for Domain Adaptive Semantic Segmentation of Electron Microscopy Images
Domain adaptive segmentation (DAS) of numerous organelle instances from large-scale electron microscopy (EM) is a promising way to enable annotation-efficient learning. Inspired by SAM, we propose a promptable multitask framework, namely Prompt-DAS, which is flexible enough to utilize any number of point prompts during the adaptation training stage and testing stage. Thus, with varying prompt configurations, Prompt-DAS can perform unsupervised domain adaptation (UDA) and weakly supervised domain adaptation (WDA), as well as interactive segmentation during testing. Unlike the foundation model SAM, which necessitates a prompt for each individual object instance, Prompt-DAS is only trained on a small dataset and can utilize full points on all instances, sparse points on partial instances, or even no points at all, facilitated by the incorporation of an auxiliary center-point detection task. Moreover, a novel prompt-guided contrastive learning is proposed to enhance discriminative feature learning. Comprehensive experiments conducted on challenging benchmarks demonstrate the effectiveness of the proposed approach over existing UDA, WDA, and SAM-based approaches.
Explainable ADHD Diagnostic Framework Using Weakly-Supervised Action Recognition
The clinical diagnosis of Attention Deficit Hyperactivity Disorder (ADHD) primarily relies on scale questionnaires, clinical interviews, and executive function tests, which face challenges including limited medical resources, low diagnostic efficiency, and high dependence on clinicians’ subjective experience. Existing AI-assisted diagnostic approaches based on behavioral analysis lack sufficient result interpretability, hindering their integration with conventional diagnostic workflows and practical clinical application. This paper proposes EDWAR, an Explainable ADHD Diagnostic Framework Using Weakly-Supervised Action Recognition, which establishes a collaborative diagnostic mechanism integrating behavioral analysis with traditional test records. By employing weakly-supervised action recognition methodology requiring only diagnostic labels and video-level annotations of abnormal behaviors, our framework not only achieves high diagnostic accuracy but also provides transparent interpretation through both video-level and timestep-wise anomaly action recognition. Experimental results demonstrate that EDWAR attains superior diagnostic performance while offering convincing and explainable evidence.
LiteTracker: Leveraging Temporal Causality for Accurate Low-latency Tissue Tracking
Tissue tracking plays a critical role in various surgical navigation and extended reality (XR) applications. While current methods trained on large synthetic datasets achieve high tracking accuracy and generalize well to endoscopic scenes, their runtime performances fail to meet the low-latency requirements necessary for real-time surgical applications. To address this limitation, we propose LiteTracker, a low-latency method for tissue tracking in endoscopic video streams. LiteTracker builds on a state-of-the-art long-term point tracking method, and introduces a set of training-free runtime optimizations. These optimizations enable online, frame-by-frame tracking by leveraging a temporal memory buffer for efficient feature reuse and utilizing prior motion for accurate track initialization. LiteTracker demonstrates significant runtime improvements being around 7× faster than its predecessor and 2× than the state-of-the-art. Beyond its primary focus on efficiency, LiteTracker delivers high-accuracy tracking and occlusion prediction, performing competitively on both the STIR and SuPer datasets. We believe LiteTracker is an important step toward low-latency tissue tracking for real-time surgical applications in the operating room. Our code is publicly available at https://github.com/ImFusionGmbH/lite-tracker.
Regularized Low-Rank Adaptation for Few-Shot Organ Segmentation
Parameter-efficient fine-tuning (PEFT) of pre-trained foundation models is increasingly attracting interest in medical imaging due to its effectiveness and computational efficiency. Among these methods, Low-Rank Adaptation (LoRA) is a notable approach based on the assumption that the adaptation inherently occurs in a low-dimensional subspace. While it has shown good performance, its implementation requires a fixed and unalterable rank, which might be challenging to select given the unique complexities and requirements of each medical imaging downstream task. Inspired by advancements in natural image processing, we introduce a novel approach for medical image segmentation that dynamically adjusts the intrinsic rank during adaptation. Viewing the low-rank representation of the trainable weight matrices as a singular value decomposition, we introduce an l1 sparsity regularizer to the loss function, and tackle it with a proximal optimizer. The regularizer could be viewed as a penalty on the decomposition rank. Hence, its minimization enables to find task-adapted ranks automatically. Our method is evaluated in a realistic few-shot fine-tuning setting, where we compare it first to the standard LoRA and then to several other PEFT methods across two distinguishable tasks: base organs and novel organs. Our extensive experiments demonstrate the significant performance improvements driven by our method, highlighting its efficiency and robustness against suboptimal rank initialization. Our code is publicly available: https://github.com/ghassenbaklouti/ARENA.
Hybrid Graph Mamba: Unlocking Non-Euclidean Potential for Accurate Polyp Segmentation
Colorectal polyp segmentation can assist doctors in screening colonoscopy images, which is crucial for the prevention of colorectal cancer. Although deep learning has significantly advanced polyp segmentation, three issues remain: (1) Most polyp segmentation methods only extract Euclidean features such as shape and texture, while neglecting non-Euclidean features, such as the geometric topology between the polyp and its surrounding tissue; (2) Non-Euclidean features vary across different regions, but most feature fusion methods overlook both the non-Euclidean topological structures and the differences between internal, edge, and background regions. (3) Low-level features are not fully exploited, and the differences between low- and high-level features are not effectively addressed. To resolve these issues, we propose Hybrid Graph Mamba (\ourmodel{}) based on Mamba and Graph Convolutional Network (GCN). Our model first uses the pyramid vision transformer to extract features at different levels. Next, we propose hybrid graph Mamba modules to process low-level features from multiple directions using quad-directional Mamba and extract non-Euclidean features with GCN. A boundary discrimination fusion module is also designed to handle high-level features, extracting semantic information for the interior, edges, and background to improve the fusion of low- and high-level features. Finally, a bidirectional Mamba decoder combines bidirectional Mamba and dilated convolutions to aggregate multi-scale features, minimizing information loss and producing the final prediction. Extensive experiments on five benchmark datasets demonstrate that \ourmodel{} significantly outperforms eight State-Of-The-Art models. Our code is publicly available at https://github.com/YueyueZhu/HGM.
SOO-Bench: Benchmarks for Evaluating the Stability of Offline Black-Box Optimization
Towards Generalizable 3D Human Pose Estimation via Ensembles on Flat Loss Landscapes
Compute-Constrained Data Selection
Single Image Test-Time Adaptation via Multi-View Co-Training
Test-time adaptation enables a trained model to adjust to a new domain during inference, making it particularly valuable in clinical settings where such on-the-fly adaptation is required. However, existing techniques depend on large target domain datasets, which are often impractical and unavailable in medical scenarios that demand per-patient, real-time inference. Moreover, current methods commonly focus on two-dimensional images, failing to leverage the volumetric richness of medical imaging data. Bridging this gap, we propose a Patch-Based Multi-View Co-Training method for Single Image Test-Time adaptation. Our method enforces feature and prediction consistency through uncertainty-guided self-training, enabling effective volumetric segmentation in the target domain with only a single test-time image. Validated on three publicly available breast magnetic resonance imaging datasets for tumor segmentation, our method achieves performance close to the upper bound supervised benchmark while also outperforming all existing state-of-the-art methods, on average by a Dice Similarity Coefficient of 3.75%. We will publicly share our accessible codebase, readily integrable with the popular nnUNet framework, at https://github.com/smriti-joshi/muvi.git.
CENet: Context Enhancement Network for Medical Image Segmentation
Medical image segmentation, particularly in multi-domain scenarios, demands precise preservation of anatomical structures across diverse representations. While deep learning has advanced this field, existing models often struggle with boundary representation, variability in organ morphology, and information loss during downsampling, limiting their accuracy and robustness. To address these challenges, we propose the Context Enhancement Network (CENet), a novel segmentation framework featuring two key innovations. First, the Dual Selective Enhancement Block (DSEB) integrated into skip connections enhances boundary details and improves the detection of smaller organs in a context-aware manner. Second, the Context Feature Attention Module (CFAM) in the decoder employs a multi-scale design to maintain spatial integrity, reduce feature redundancy, and mitigate overly enhanced representations. Extensive evaluations on both radiology and dermoscopic datasets demonstrate that CENet outperforms state-of-the-art (SOTA) methods in multi-organ segmentation and boundary detail preservation, offering a robust and accurate solution for complex medical image analysis tasks. The source code is publicly available at https://github.com/xmindflow/cenet.
Temporal Atlas-Guided Generation of Longitudinal Data via Geometric Latent Embeddings
The spatiotemporal changes of a developing anatomical structure is a dynamic process, and quantifying this process within a population and between populations is a fundamental yet challenging task in medical image analysis. Central to this task is the availability of longitudinal imaging data for 4D statistical shape analysis. Unfortunately, this type of longitudinal data is expensive, time-consuming, and difficult to collect. Practically, the majority of imaging data are 3D cross-sectional data, which are inadequate in describing the dynamic shape changes of anatomical structures. In this paper, we introduce a novel temporal atlas-guided deep learning model for longitudinal data generation. Unlike existing methods that directly generate longitudinal data from input images or sequences, we characterize distinctive geometric shape representations in both cross-sectional and longitudinal latent spaces of diffeomorphisms, while optimizing the quality of both atlas and longitudinal data generation. To the best of our knowledge, this is the first deep learning approach that leverages temporal atlas-based representation for longitudinal data generation. The innovative nature of our framework lies in its ability to jointly perform within-age and cross-age shape registration, thus maximizing registration performance while maintaining desirable deformation qualities. Our work’s ability to model spatiotemporal dynamics makes it highly versatile and applicable to a wide range of domains, including modeling the normal and abnormal development of anatomical structures for improved clinical diagnosis and treatment planning. The code of this work is available at https://github.com/wushaoju/TAG-GLE.
Multi-Tube-Voltage vBMD Measurement via Dual-Branch Frequency Balancing and Asymmetric Channel Attention
Phantom-less volumetric bone mineral density (vBMD) measurement using computed tomography (CT) presents a cost-effective alternative to conventional phantom-based approaches, yet faces accuracy challenges across varying tube voltages. Current deep learning-based phantom-less solutions frequently overlook the critical role of frequency variance—a crucial factor for precise BMD measurement and cross-voltage generalization. We present a lightweight CT-based phantom-free vBMD measurement framework that addresses critical limitations in cross-voltage generalization. Core innovations include: (1) Frequency-balancing feature modulation with multi-band fusion, preserving spectral measurement cues; (2) A dual-branch architecture combining domain-specific convolutions with cross-frequency interaction; and (3) Asymmetric channel attention, which allocates attention weights based on frequency characteristics, enabling adaptive emphasis on critical low- and high-frequency components. Comprehensive evaluations across 80, 1
The Refining of Brain Connectivity Features on Residual Posterior Patterns
In conjunction with graph neural networks (GNNs), functional connectivity analysis based on fMRI data can provide insights into the interaction and communication patterns in brain network, which has gained increasing attention in the diagnosis of neuropsychiatric disorders. However, traditional GNN based models focus primarily on brain regions, with limited attention given to changes in brain connectivity induced by diseases, and often lack specific methods to address noise and outliers. To accurately preserve and analyze connections in brain networks and retain the structure information in the original graph over message passing, we propose an Residual-Posterior Line Graph Network (RP-LGN). RP-LGN innovatively re-models each edge as a node to highlight functional connectivity information. Subsequently, it integrates residual blocks and a single-pass, low-variance Bayesian variational inference method to approximate the true posterior distribution. Bayesian variational posterior facilitates the quantification of uncertainty in model predictions and enhances model robustness in the presence of noise and anomalous data, ultimately promoting more accurate clinical decision-making. Compared with other models,the performance of RP-LGN was validated on the ABIDE and ADHD-200 dataset, with significant accuracy improvements, and revealed significant site-specific differences and unique connection patterns associated with diseases.
One-shot active learning for vessel segmentation
Vessel segmentation is crucial for analyzing brain vasculature and understanding cerebral functions and disease mechanisms. Current deep-learning models for segmenting blood vessels within brain images are supervised and depend on extensive labeled data, which requires expert annotation and is both time-consuming and resource-intensive. To address these challenges, we propose Vessel-Dictionary Selection Net (V-DiSNet), a one-shot active learning (OSAL) framework specifically designed for vessels that can be used to select a small, representative set of informative and diverse samples for expert annotation and training, given an unlabeled dataset in a single iteration. The selection process involves sampling from a latent space designed by leveraging the recurrent properties of brain vessel patterns. Specifically, we combine dictionary learning with k-means clustering to learn a latent representation integrating fundamental basis elements representing recurrent vessel features such as shape, connectivity, and structures. We experimentally demonstrate the effectiveness of our method on three publicly available 3D Magnetic Resonance Angiography datasets, showing that V-DisNet consistently outperforms random sampling and other state-of-the-art OSAL methods in terms of standard vessel segmentation metrics. Our code is available at github.com/i-vesseg/V-DiSNet.
Anatomical Structure Few-Shot Detection Utilizing Enhanced Human Anatomy Knowledge in Ultrasound Images
Deep learning-based models have significantly advanced clinical ultrasound tasks by detecting anatomical structures within vast ultrasound image datasets. However, their remarkable performance inherently requires extensive training of annotated medical datasets. Few-shot learning addresses the challenge of limited labeled data for model training. Currently, few-shot learning in the field of medical image analysis mainly focuses on classification and semantic segmentation, with relatively fewer studies on object detection. In this paper, we propose a novel few-shot anatomical structure detection method in ultrasound images called TRR-CCM, which consists of Circular Channel Mamba (CCM) and Topological Relationship Reasoning (TRR) based on human anatomy knowledge. CCM, as a new Mamba variant, performs contextual modeling of anatomical structures and captures long- and short-term dependencies. TRR learns spatial topological relationships between human anatomical structures to further improve the accuracy of detection and localization. Experimental results on two fetal ultrasound datasets demonstrate that TRR-CCM outperforms 9 state-of-the-art baseline methods.
Cross-Modal Brain Graph Transformer via Function-Structure Connectivity Network for Brain Disease Diagnosis
Multi-modal brain networks represent the complex connectivity between different brain regions from both functional and structural perspectives, which is of great significance for brain disease diagnosis. However, existing methods are limited to information fusion in the feature dimension, failing to fully exploit the complementary information between functional and structural connectivity networks. To address these issues, this paper proposes a cross-modal brain graph transformer (CBGT) method for brain disease diagnosis, which also provides an in-depth analysis of coupled functional-structural connectivity networks. Specifically, CBGT consists of two main modules: the cross-modal Transformer module enhances the attention mechanism by utilizing structural connectivity features extracted through machine learning methods, capturing long-range dependencies in the cross-modal brain network. The cross-modal topK pooling module combines information from both functional and structural connectivity networks to select significant regions of interest (ROIs) during the reconstruction of the pooled graph, aiming to retain as much effective information as possible. Experiments conducted on the ABIDE and ADNI datasets demonstrate that the proposed method outperforms state-of-the-art approaches. Interpretation analysis reveals that the proposed method can identify multi-modal biomarkers associated with brain diseases.
Counterfactual Explanations for Conformal Prediction Sets
Pre-to-Post Operative MRI Generation with Retrieval-based Visual In-Context Learning
Glioblastoma is an aggressive brain tumor requiring precise treatment planning. Magnetic resonance imaging (MRI) is essential for pre-operative assessment, surgical resection planning, and post-operative monitoring. Therefore, generating post-operative MRI from pre-operative MRI can assist neurosurgeons in many ways, such as predicting surgical outcomes and guiding treatment planning. However, generating post-operative MRI from pre-operative MRI is challenging, as the resection extent depends on tumor location and infiltration to minimize potential complications, necessitating consideration of surgical outcomes based on tumor location and shape. Furthermore, post-operative MRI differs significantly from pre-operative MRI due to structural and visual changes, such as tissue shift, edema, hemorrhage, and the resection region. To address these challenges, we propose a novel post-operative MRI generation method that generates post-operative MRI from pre-operative MRI using tumor-aware visual in-context learning. Specifically, we provide explicit visual instruction for generating post-operative MRI from pre-operative MRI, improving the capture of structural changes. To consider tumor-specific post-operative outcomes, we propose tumor-guided retrieval, which retrieves the tumor case most similar to the query pre-operative MRI, and a tumor-aware prompt adapter that integrates tumor resection and anatomical structure information. Our proposed method achieves superior performance on publicly available dataset and is the first to generate post-operative MRI from pre-operative MRI, introducing a new approach to improving patient prognosis.
Radar-Based Imaging for Sign Language Recognition in Medical Communication
Ensuring equitable access to medical communication is crucial for deaf and hard-of-hearing individuals, especially in clinical settings where effective patient-doctor interaction is essential. In this work, we present a novel radar-based imaging framework for Sign Language recognition (with a focus on the Italian Sign Language, LIS), specifically designed for medical communication. Our method leverages 60 GHz mm-wave radar to capture motion features while ensuring anonymity by avoiding the use of personally identifiable visual data. Our approach performs sign language classification through a two-stage pipeline: first, a residual autoencoder processes Range Doppler Maps (RDM) and moving-target indications (MTI), compressing them into compact latent representations; then, a Transformer-based classifier learns temporal dependencies to recognize signs across varying durations. By relying on radar-derived motion imaging, our method not only preserves privacy but also establishes radar as a viable tool for analyzing human motion in medical applications beyond sign language, including neurological disorders and other movement-related conditions. We carried out experiments on a new large-scale dataset containing 126 LIS signs - 100 medical terms and 26 alphabet letters. Our method achieves 93.6% accuracy, 87.9% sensitivity, 99.3% specificity, and an 87.7% F1 score, surpassing existing approaches, including an RGB-based baseline. These results underscore the potential of radar imaging for real-time human motion monitoring, paving the way for scalable, privacy-compliant solutions in both sign language recognition and broader clinical applications. The code is available at https://github.com/IngRaffaeleMineo/SignRadarClassification_MICCAI2025 and the dataset will be released publicly.
Queue Test From Local PDF
Physics-informed neural networks (PINN) have achieved notable success in solving partial differential equations (PDE), yet solving the Navier-Stokes equations (NSE) with complex boundary conditions remains a challenging task. In this paper, we introduce a novel Hybrid Boundary PINN (HB-PINN) method that combines a pretrained network for efficient initialization with a boundary-constrained mechanism. The HB-PINN method features a primary network focused on inner domain points and a distance metric network that enhances predictions at the boundaries, ensuring accurate solutions for both boundary and interior regions. Comprehensive experiments have been conducted on the NSE under complex boundary conditions, including the 2D cylinder wake flow and the 2D blocked cavity flow with a segmented inlet. The proposed method achieves state-of-the-art (SOTA) performance on these benchmark scenarios, demonstrating significantly improved accuracy over existing PINN-based approaches.
Learning to increase matching efficiency in identifying additional b-jets in the process
Hybrid Boundary Physics-Informed Neural Networks for Solving Navier-Stokes Equations with Complex Boundary
Physics-informed neural networks (PINN) have achieved notable success in solving partial differential equations (PDE), yet solving the Navier-Stokes equations (NSE) with complex boundary conditions remains a challenging task. In this paper, we introduce a novel Hybrid Boundary PINN (HB-PINN) method that combines a pretrained network for efficient initialization with a boundary-constrained mechanism. The HB-PINN method features a primary network focused on inner domain points and a distance metric network that enhances predictions at the boundaries, ensuring accurate solutions for both boundary and interior regions. Comprehensive experiments have been conducted on the NSE under complex boundary conditions, including the 2D cylinder wake flow and the 2D blocked cavity flow with a segmented inlet. The proposed method achieves state-of-the-art (SOTA) performance on these benchmark scenarios, demonstrating significantly improved accuracy over existing PINN-based approaches.
Wavelet-driven Decoupling and Physics-informed Mapping Network for Accelerated Multi-parametric MR Imaging
Multi-parametric magnetic resonance imaging (MRI) is an advanced MRI technique that can provide multiple quantitative maps simultaneously based on acquired multi-echo images. However, the lengthy scan time often limits its application. Accelerated multi-parametric MRI using deep learning is of great interest. The existing studies have two limitations: 1) inefficient use of the multi-echo information; 2) lack of physical prior for parametric mapping. To address these issues, in this work, we propose a novel decoupling-driven and physics-informed reconstruction network for accelerated multi-parametric MRI. Specifically, to better align and integrate multi-echo information, we propose a novel decoupling technique consisting of wavelet-driven decoupling module, contrastive and echo-dependent decoupling losses, such that the multi-echo features can be effectively decoupled into echo-dependent and echo-independent components. Only the echo-independent features are fused across multiple echoes. Besides, Bloch equations are incorporated as physical priors to guide the parametric mapping network. Experimental results on our in-house data (12-echo sequence) show that our method outperforms the state-of-the-art methods by 1.54% in average SSIM and 1.70dB in average PSNR for 4× acceleration, which significantly advances the performance limitation for multi-parametric MRI. Our code is available at https://github.com/IDEARL23/WDPM-Net.
Static Posts
Platform notes and research posts are also linked here for crawl coverage.
Translation Flow for Global Research Posts
Designing a pipeline that keeps English originals and localized summaries aligned across multiple languages.
Ranking Features for Isomorphic Search
Signals for combining topic similarity, deadline urgency, and editorial priority in a single result set.