JECTIVES: The body mass index (BMI) provides essential medical information related to body weight for the treatment and prognosis prediction of diseases such as cardiovascular disease, diabetes, and stroke. We propose a method for the prediction of normal, overweight, and obese classes based only on the combination of voice features that are associated with BMI status, independently of weight and height measurements.MATERIALS AND METHODS: A total of 1568 subjects were divided into 4 groups according to age and gender differences. We performed statistical analyses by analysis of variance (ANOVA) and Scheffe test to find significant features in each group. We predicted BMI status (normal, overweight, and obese) by a logistic regression algorithm and two ensemble classification algorithms (bagging and random forests) based on statistically significant features.RESULTS: In the Female-2030 group (females aged 20-40 years), classification experiments using an imbalanced (original) data set gave area under the receiver operating characteristic curve (AUC) values of 0.569-0.731 by logistic regression, whereas experiments using a balanced data set gave AUC values of 0.893-0.994 by random forests. AUC values in Female-4050 (females aged 41-60 years), Male-2030 (males aged 20-40 years), and Male-4050 (males aged 41-60 years) groups by logistic regression in imbalanced data were 0.585-0.654, 0.581-0.614, and 0.557-0.653, respectively. AUC values in Female-4050, Male-2030, and Male-4050 groups in balanced data were 0.629-0.893 by bagging, 0.707-0.916 by random forests, and 0.695-0.854 by bagging, respectively. In each group, we found discriminatory features showing statistical differences among normal, overweight, and obese classes. The results showed that the classification models built by logistic regression in imbalanced data were better than those built by the other two algorithms, and significant features differed according to age and gender groups.CONCLUSION: Our results could support the development of BMI diagnosis tools for real-time monitoring; such tools are considered helpful in improving automated BMI status diagnosis in remote healthcare or telemedicine and are expected to have applications in forensic and medical science.

jectiv bodi mass index bmi provid essenti medic inform relat bodi weight treatment prognosi predict diseas cardiovascular diseas diabet stroke propos method predict normal overweight obes class base combin voic featur associ bmi status independ weight height measurementsmateri method total subject divid group accord age gender differ perform statist analys analysi varianc anova scheff test find signific featur group predict bmi status normal overweight obes logist regress algorithm two ensembl classif algorithm bag random forest base statist signific featuresresult femal group femal age year classif experi use imbalanc origin data set gave area receiv oper characterist curv auc valu logist regress wherea experi use balanc data set gave auc valu random forest auc valu femal femal age year male male age year male male age year group logist regress imbalanc data respect auc valu femal male male group balanc data bag random forest bag respect group found discriminatori featur show statist differ among normal overweight obes class result show classif model built logist regress imbalanc data better built two algorithm signific featur differ accord age gender groupsconclus result support develop bmi diagnosi tool realtim monitor tool consid help improv autom bmi status diagnosi remot healthcar telemedicin expect applic forens medic scienc

