Presentation Information
[POS-62]A Data-Driven approach to interpreting Bacterial Responses to Environmental Conditions in Food
*Junpei Hosoe1, Shige Koseki1, Kento Koyama1 (1. Hokkaido University (Japan))
Keywords:
data mining,machine learning,food-borne illness,SHAP
In food microbiology, mathematical models have been used for predicting various bacterial behavior in foods, such as population growth, inactivation, or growth/no-growth boundary model, to ensure microbiological food safety. Although bacterial population behavior has been investigated in a variety of foods in the past 40 years, it is difficult to obtain desired information from the mere juxtaposition of experimental data. We predicted the changes in the number of bacteria and interpret the effects of pH, aw, and temperature using a data mining approach.
Population growth and inactivation data on eight pathogenic and food spoilage bacteria were obtained from the ComBase, including three food categories (culture medium, beef, and poultry), and temperatures ranging from 0°C to 25°C. The eXtreme gradient boosting tree (XGBoost) was used to predict population behavior. Nine types of explanatory variables were included: “Time (h),” “Temperature (°C),” “pH,” “aw,” “Initial cell number or not”, “Initial cell number (log CFU/g),” “Food category,” “Food name.” The data included both numerical and categorical data. “Time,” “Temperature,” “pH,” ‘aw,” and “Initial cell number” were numerical data, which were used without modification for model development. The viable cell concentration at 0 h was used as the initial cell number for each record ID. Furthermore, because food category, and food name are categorical variables, they were replaced with dummy variables. To evaluate the model accuracies, we calculated the root mean squared error (RMSE) values. The feature importance was calculated to interpret the developed model from the process of model development. For interpretation of bacterial responses to environmental conditions, we used the SHAP framework proposed by Lundberg and Lee (2017). All the analysis were conducted using Python (version 3.12.7).
By developing machine learning models, our data-mining approach demonstrated generally consistent prediction accuracy under various types of microorganisms and food categories. For each organism, the RMSE values were below 1.35, which is comparable to or better than those reported in previous studies. These results show that the developed models respond flexibly to various environmental conditions in different amounts of data. The SHAP values for features related to environmental conditions showed reasonable trends that align with established knowledge in food microbiology.
Overall, we successfully extracted meaningful insights into the relationships between bacterial growth and environmental factors, taking into account variations in both bacterial species and food types.
Population growth and inactivation data on eight pathogenic and food spoilage bacteria were obtained from the ComBase, including three food categories (culture medium, beef, and poultry), and temperatures ranging from 0°C to 25°C. The eXtreme gradient boosting tree (XGBoost) was used to predict population behavior. Nine types of explanatory variables were included: “Time (h),” “Temperature (°C),” “pH,” “aw,” “Initial cell number or not”, “Initial cell number (log CFU/g),” “Food category,” “Food name.” The data included both numerical and categorical data. “Time,” “Temperature,” “pH,” ‘aw,” and “Initial cell number” were numerical data, which were used without modification for model development. The viable cell concentration at 0 h was used as the initial cell number for each record ID. Furthermore, because food category, and food name are categorical variables, they were replaced with dummy variables. To evaluate the model accuracies, we calculated the root mean squared error (RMSE) values. The feature importance was calculated to interpret the developed model from the process of model development. For interpretation of bacterial responses to environmental conditions, we used the SHAP framework proposed by Lundberg and Lee (2017). All the analysis were conducted using Python (version 3.12.7).
By developing machine learning models, our data-mining approach demonstrated generally consistent prediction accuracy under various types of microorganisms and food categories. For each organism, the RMSE values were below 1.35, which is comparable to or better than those reported in previous studies. These results show that the developed models respond flexibly to various environmental conditions in different amounts of data. The SHAP values for features related to environmental conditions showed reasonable trends that align with established knowledge in food microbiology.
Overall, we successfully extracted meaningful insights into the relationships between bacterial growth and environmental factors, taking into account variations in both bacterial species and food types.