• Data-Augmented Machine Learning for In Vitro Starch Digestibility Prediction and SHAP-Based Interpretation of Structure–Rheology Features
  • Yeonsong Nam*,#, Sehyeon Jin**,#, Yerin Hyun*, Cheng Li***, Ji Hun Park****, *****, ******, Jongbin Lim*, *******,† , and Yun Am Seo**,†

  • *Department of Food Bioengineering, Jeju National University, Jeju 63243, Korea
    **Department of Data Science, Jeju National University, Jeju-si, 63243, Korea
    ***Food & Nutritional Sciences Programme, School of Life Sciences, The Chinese University of Hong Kong, Shatin 999077, Hong Kong
    ****Department of Science Education, Ewha Womans University, Seoul 03760, Korea
    *****Institute for Multiscale Matter and Systems, Ewha Womans University, Seoul 03760, Korea
    ******Ecogear Inc. Jeju Factory, Jeju 63359, Korea
    *******Interdisciplinary Graduate Program in Advance Convergence Technology and Science, Jeju National University, Jeju 63243, Korea

  • 데이터 증강 기반 기계학습을 이용한 전분의 In Vitro 소화율 예측과 구조–레올로지 특성의 SHAP 기반 중요도 해석
  • 남연송*,# · 진세현**,# · 현예린* · Cheng Li*** · 박지훈****, *****, ****** · 임종빈*, *******,† · 서윤암**,†

  • Reproduction, stored in a retrieval system, or transmitted in any form of any part of this publication is permitted only by written permission from the Polymer Society of Korea.

Abstract

Starch is a polysaccharide biopolymer whose structure–rheology relationships influence its functional behavior, including in vitro digestibility; however, small datasets often limit the accuracy of quantitative predictions. Here, the in vitro digestibility (0–1) of ten starch samples was modeled using molecular features (A- and B1-chain fractions and amylose content) and pasting/rheological features. Four tabular data-augmentation methods (FastML preset, Gaussian copula, tabular variational autoencoder, and conditional tabular generative adversarial network) were benchmarked using quality metrics, and the optimal approach generated 200 synthetic samples for model training. random forest, support vector regression, XGBoost, lightGBM, and neural network were optimized through grid search. Among these, the neural network demonstrated the best predictive performance (R2 = 0.907). SHAP (Shapley Additive Explanations) analysis was then applied to interpret the trained model, identifying consistency index, setback viscosity, and peak viscosity as dominant contributors, highlighting the roles of gel strength and viscosity recovery. This framework offers a data-driven tool for the rapid screening and design of starch-based materials through small-sample experiments.


전분은 대표적인 다당류 바이오폴리머로서 구조–레올로지 관계가 기능적 거동, 특히 in vitro 소화 특성에 영향을 미치지만, 제한된 데이터로 인해 정량적 예측이 어렵다. 본 연구에서는 10종 전분의 in vitro 소화율(0–1)을 A- 및 B1-사슬 분율, 아밀로스 함량 등 분자적 특성과 페이스팅/레올로지 특성을 이용하여 전분 소화율을 예측하는 기계학습 모델을 개발하였다. FastML preset, Gaussian copula, tabular variational autoencoder, and conditional tabular generative adversarial network의 4가지 데이터 증강 기법을 품질 지표로 비교한 뒤 최적 기법으로 200개의 합성 데이터를 생성하여 학습에 활용하였다. 5가지 기계학습 알고리즘(random forest, support vector regression, XGBoost, lightGBM, neural network)을 그리드 서치로 최적화한 결과 신경망 모델이 최고 성능(R2=0.907)을 보였다. 이후 shapley additive explanations(SHAP) 분석을 통해 consistency index, setback viscosity, peak viscosity가 주요 기여 인자로 확인되어, 겔 강도 및 점도 회복 특성이 소화 특성에 중요한 역할을 함을 시사한다. 본 프레임워크는 소표본 실험 환경에서 전분 기반 소재의 소화 특성 예측 및 선별, 그리고 설계 방향 제시에 활용될 수 있다.


Keywords: starch biopolymer, starch digestibility, data augmentation, machine learning, shapley additive explanations.

  • Polymer(Korea) 폴리머
  • Frequency : Bimonthly(odd)
    ISSN 2234-8077(Online)
    Abbr. Polym. Korea
  • 2024 Impact Factor : 0.6
  • Indexed in SCIE

This Article

  • 2026; 50(3): 467-476

    Published online May 25, 2026

  • 10.7317/pk.2026.50.3.467
  • Received on Jan 19, 2026
  • Revised on Feb 12, 2026
  • Accepted on Feb 14, 2026

Correspondence to

  • Jongbin Lim*, *******, Yun Am Seo**
  • *Department of Food Bioengineering, Jeju National University, Jeju 63243, Korea
    **Department of Data Science, Jeju National University, Jeju-si, 63243, Korea
    *******Interdisciplinary Graduate Program in Advance Convergence Technology and Science, Jeju National University, Jeju 63243, Korea

  • E-mail: jongbinlim@jejunu.ac.kr, seoya@jejunu.ac.kr