Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers Shuning Shang, Xuran Meng, Yuan Cao, Difan Zou Preprint. (Arxiv)
Benign Overfitting in Single-Head Attention Roey Magen*, Shuning Shang*, Zhiwei Xu, Spencer Frei, Wei Hu, Gal Vardi Preprint. (Arxiv)