We introduce Arches, a hardware simulation framework designed to explore and evaluate massively parallel ray-tracing architectures. Operating at the cycle level, Arches captures detailed performance metrics, including computational throughput, on-chip data movement across processors, caches, and off-chip communication via an accurate memory system model. The framework is modular, allowing flexible configuration and interconnection of processor cores, caches, and custom hardware units, enabling easy exploration of diverse hardware architectures. Arches supports high-performance parallel execution, simulating complex ray tracing workloads to image completion. It leverages the GNU toolchain, allowing users to write C++ software targeting both the simulated architecture and native execution for debugging, including support for custom instructions to control specialized hardware components. The framework provides comprehensive performance instrumentation, offering insights into time-varying statistics across all modules and identifying performance bottlenecks. Our evaluations demonstrate that Arches delivers performance estimates closely matching real hardware, offering faster and more accurate simulations than existing open-source hardware simulators. Its modularity also makes it a valuable tool for exploring alternative parallel computing strategies for high-performance ray tracing, and its extensibility enables adaptation for other workloads or general-purpose computation.