I am a fifth-year student in the master’s-doctoral combined program at the University of Chinese Academy of Sciences (2020–present). My research interests include multimodal large model reasoning, multimodal agent reasoning, and knowledge distillation.
Feel free to contact me via email for academic discussions or internship opportunities! I am expected to graduate in June 2026 and look forward to job opportunities.
🔥 News
- 2025.08: 🎉🎉 Our collaborative technical report with the Seed Team, StructVRM, has been released! This work focuses on deep thinking in large models and achieves remarkable performance in mathematical and scientific reasoning. Paper link, Results link.
- 2025.07: 🎉🎉 One paper on Multimodal Scientific reasoning has been accepted by ACM Multimedia 2025.
- 2025.05: 🎉🎉 One paper on Multimodal Chain-of-Thought Verification has been accepted by ACL 2025.
- 2025.04: 🎉🎉 One paper on Sketch-to-Diagram has been accepted by IJCAI 2025.
- 2025.04: 🎉🎉 One paper on Chinese Multimodal Attribute Extraction has been accepted by ICIC 2025.
- 2025.04: 🎉🎉 One paper on Document Retrieval has been accepted by ICIC 2025.
- 2025.04: 🎉🎉 One paper on EEG-to-Text has been accepted by ICIC 2025.
- 2025.02: 🎉🎉 One paper on Text-to-Diagram has been accepted by CVPR 2025 (highlight).
- 2024.07: 🎉🎉 One paper on Multimodal Chain-of-Thought is accepted by ECCV 2024.
- 2024.07: 🎉🎉 One paper on Multimodal Chain-of-Thought is accepted by NCAA.
- 2024.06: 🎉🎉 One paper on A Survey on Multimodal Large Model Applications is accepted by CIBM.
- 2024.05: 🎉🎉 One paper on Interpretable and Generalizable Spatiotemporal Learning is accepted by ECML-PKDD 2024.
- 2024.04: 🎉🎉 One paper on Knowledge Distillation is accepted by IJCAI 2024.
📝 Publications

From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing
Jingxuan Wei, Cheng Tan, Qi Chen, Gaowei Wu, Siyuan Li, Zhangyang Gao, Linzhuang Sun, Bihui Yu, Ruifeng Guo
@article{wei2024words, title={From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing}, author={Wei, Jingxuan and Tan, Cheng and Chen, Qi and Wu, Gaowei and Li, Siyuan and Gao, Zhangyang and Sun, Linzhuang and Yu, Bihui and Guo, Ruifeng}, journal={CVPR}, year={2025} }

Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Cheng Tan, Jingxuan Wei, Zhangyang Gao, Linzhuang Sun, Siyuan Li, Xihong Yang, Stan Z. Li
@inproceedings{tan2024boosting, title={Boosting the power of small multimodal reasoning models to match larger models with self-consistency training}, author={Tan, Cheng and Wei, Jingxuan and Gao, Zhangyang and Sun, Linzhuang and Li, Siyuan and Guo, Ruifeng and Yu, Bihui and Li, Stan Z}, booktitle={European Conference on Computer Vision}, pages={305--322}, year={2024}, organization={Springer} }

Enhancing Human-like Multimodal Reasoning: A New Challenging Dataset and Comprehensive Framework
Jingxuan Wei, Cheng Tan, Zhangyang Gao, Linzhuang Sun, Siyuan Li, Bihui Yu, Ruifeng Guo, Stan Z. Li
@article{wei2024enhancing, title={Enhancing human-like multimodal reasoning: a new challenging dataset and comprehensive framework}, author={Wei, Jingxuan and Tan, Cheng and Gao, Zhangyang and Sun, Linzhuang and Li, Siyuan and Yu, Bihui and Guo, Ruifeng and Li, Stan Z}, journal={Neural Computing and Applications}, volume={36}, number={33}, pages={20849--20861}, year={2024}, publisher={Springer} }

A Survey on Advancements in Image-Text Multimodal Models: From General Techniques to Biomedical Implementations
Ruifeng Guo, Jingxuan Wei, Linzhuang Sun, Bihui Yu, Guiyong Chang, Dawei Liu, Sibo Zhang, Zhengbing Yao, Mingjun Xu, Liping Bu
@article{guo2024survey, title={A survey on advancements in image-text multimodal models: From general techniques to biomedical implementations}, author={Guo, Ruifeng and Wei, Jingxuan and Sun, Linzhuang and Yu, Bihui and Chang, Guiyong and Liu, Dawei and Zhang, Sibo and Yao, Zhengbing and Xu, Mingjun and Bu, Liping}, journal={Computers in Biology and Medicine}, pages={108709}, year={2024}, publisher={Elsevier} }

Interpretable and Generalizable Spatiotemporal Predictive Learning with Disentangled Consistency
Jingxuan Wei, Cheng Tan, Zhangyang Gao, Linzhuang Sun, Bihui Yu, Ruifeng Guo, Stan Li
@inproceedings{wei2024interpretable, title={Interpretable and Generalizable Spatiotemporal Predictive Learning with Disentangled Consistency}, author={Wei, Jingxuan and Tan, Cheng and Gao, Zhangyang and Sun, Linzhuang and Yu, Bihui and Guo, Ruifeng and Li, Stan}, booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases}, pages={3--20}, year={2024}, organization={Springer} }

Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation
Jingxuan Wei, Linzhuang Sun, Yichong Leng, Xu Tan, Bihui Yu, Ruifeng Guo
@inproceedings{ijcai2024p722, title = {Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation}, author = {Wei, Jingxuan and Sun, Linzhuang and Leng, Yichong and Tan, Xu and Yu, Bihui and Guo, Ruifeng}, booktitle = {Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, {IJCAI-24}}, publisher = {International Joint Conferences on Artificial Intelligence Organization}, editor = {Kate Larson}, pages = {6531--6540}, year = {2024}, month = {8}, note = {Main Track}, doi = {10.24963/ijcai.2024/722}, url = {https://doi.org/10.24963/ijcai.2024/722}, }
- arXiv: MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification. Sun, Linzhuang and Liang, Hao and Wei, Jingxuan et al.
- SMC: Faster and More Efficient Subject Image Generation for Text-to-Image Diffusion Models. Yu, Bihui and Yao, Zhengbing and Wei, Jingxuan et al.
- SMC: SAM-Wav2lip++: Enhancing Behavioral Realism in Synthetic Agents Through Audio-Driven Speech and Action Refinement. Yu, Bihui and Liu, Dawei and Wei, Jingxuan et al.
- ADMA: TED-CS: Textual Enhanced Sensitive Video Detection with Common Sense Knowledge. Yu, Bihui and Sun, Linzhuang and Wei, Jingxuan et al.
- Computers & Electrical Engineering: Feature-guided Multimodal Sentiment Analysis Towards Industry 4.0. Yu, Bihui and Wei, Jingxuan et al.
🎖 Honors and Awards
- 2018.09 Inner Mongolia Autonomous Region Merit Student
- 2019.09 National Scholarship
- 2020.09 Inner Mongolia Autonomous Region Merit Student
- 2021.09 University of Chinese Academy of Sciences Merit Student
- 2022.09 University of Chinese Academy of Sciences Merit Student
- 2022.09 National Scholarship
📖 Educations
- 2023.03-present Ph.D. in University of Chinese Academy of Sciences. Supervisor: Prof. Ruifeng Guo and Prof. Bihui Yu.
- 2020.09-2022.12 M.S. in University of Chinese Academy of Sciences. Supervisor: Prof. Bihui Yu.
- 2016.09-2020.06 B.S. in Inner Mongolia University of Science and Technology. Ranks first in the major and college.
🛠 Services
Program committee member | Reviewer
- Annual Meeting of the Association for Computational Linguistics (ACL)
- Empirical Methods in Natural Language Processing (EMNLP)
- IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- International Conference on Learning Representations (ICLR)
- International Conference on Machine Learning (ICML)
- Conference and Workshop on Neural Information Processing Systems (NeurIPS)
- ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)
- International Journal of Computer Vision (IJCV)