UBEP Ethical Assessment Tool

Explainability

The functioning of the model is completely hidden, similar to a 'black box' (A deep learning model without any form of interpretability tool.)

The model provides limited insights, leaving users mostly in the dark (A neural network with rudimentary metrics on the importance of features.)

The model is semi-transparent and offers some clarity on its decisions (A decision tree too complex to be fully understood.)

The model is clear on most of its decisions, with only small opaque areas (A random forest model with well-documented feature importance.)

The model is completely transparent and describes in detail every aspect of its decision-making process (A linear regression model with clear coefficients.)

Not evaluated (-)

Generalizability

The model has extremely poor performance on new data (A cardiac diagnostic tool trained on elderly patients completely fails to identify common heart diseases in pediatric patients.)

The model struggles with new data, often making errors (A nutrition model trained on data from urban centers underperforms when predicting malnutrition trends in rural areas.)

The model manages decently with new data, but has occasional glitches (A thoracic model, although trained on a diverse dataset, occasionally misclassifies rare conditions when applied to data from a new epidemiological study.)

The model fits new data well, with errors being the exception (A vascular model trained on multicentric data successfully classifies most vascular conditions but occasionally struggles with edge cases from certain regions.)

The model adapts well to all new data it encounters (A pediatric model trained across various global multicentric studies provides accurate predictions for pediatric conditions regardless of the region of data origin.)

Not evaluated (-)

Open data

The data is locked and cannot be accessed (A proprietary dataset used without any sharing mechanism.)

Access to data is highly restricted, granted only under specific conditions (Restricted access on request with non-disclosure agreements.)

Data is shared with certain reservations or restrictions (Partial dataset shared for research purposes.)

Data are mostly open with minimal conditions (Data are shared openly with some reduced features for privacy reasons.)

Complete openness, with data freely available to all (An open-source dataset with clear documentation.)

Not evaluated (-)

Risk of bias

The model consistently shows strong bias (A cardiac risk prediction tool trained on Western populations that consistently misclassifies risk in non-Western populations due to unrepresented genetic and lifestyle factors.)

The model regularly shows bias, even though there are attempts to correct it (A nutrition study that, while aiming for diversity, still occasionally underrepresents the dietary habits of certain minority groups, leading to skewed recommendations.)

Biases are present, but there are clear attempts to mitigate them (An epidemiological model predicting disease spread that has incorporated corrections for urban bias, but still occasionally mispredicts in rural settings.)

The model rarely shows bias and actively tries to avoid it (A multicentric pediatric study that, after rigorous data harmonization, occasionally shows minor discrepancies in patient outcomes from certain centers.)

The model is almost free of bias, with strong mechanisms ensuring fairness (A vascular treatment outcome model, trained on a global multicentric dataset, that has been meticulously adjusted and validated to ensure near-universal applicability across diverse patient groups.)

Not evaluated (-)

Impact of wrong predictions

Errors lead to serious consequences (A cardiac risk assessment tool that wrongly classifies a patient as low-risk leading to undetected heart failure.)

Errors lead to significant negative impacts, but some safeguards exist (A drug dosage prediction model that occasionally recommends a slightly higher dosage, but with regular monitoring in place to adjust.)

Errors have a significant impact, but mitigation measures exist (An epidemiological model that underestimates the spread of a seasonal flu, leading to brief vaccine shortages, but with rapid response teams in place.)

Erroneous forecasts have minor impacts due to preventive mechanisms (A predictive model for patient hospital readmission that occasionally overestimates, leading to slightly increased hospital resource allocation, but without straining the system.)

The impact of errors is almost negligible due to strong safeguard mechanisms (A nutrition recommendation tool that occasionally suggests a less optimal meal choice, but the overall dietary plan remains balanced and nutritious.)

Not evaluated (-)

Transparency

The design and implementation of the model are hidden, and users are unaware of its operation (A proprietary model used without any documentation.)

Transparency is minimal and leaves many questions unanswered (A model with limited documentation on data and training methods.)

There is a fair level of transparency, although some areas could be clearer (A model with clear documentation but no details on hyperparameter adjustment.)

The model is mostly transparent, with only minor areas left uncovered (A model with thorough documentation and some insights into its decision-making process.)

Complete transparency in every aspect from design to implementation (An open-source model with documentation, training data, and clear methods.)

Not evaluated (-)

Data Privacy

Data is used without any protective measures (Storage of user data in plain text without encryption.)

Some protection measures exist, but they are full of vulnerabilities (Data stored with basic encryption, but with known security flaws.)

Data are decently protected, although there is room for improvement (Data encrypted and stored securely, but without regular security checks.)

Data is well protected and has only minor potential risks (Data is stored with state-of-the-art encryption and regular security checks.)

Data protection is top-notch and adheres to best practices (Data is managed with strong encryption, regular audits, and strict access controls.)

Not evaluated (-)

Reproducibility

The model is a 'black box' with no details for replication (A distributed proprietary model with no associated data or parameters, and no clarity on the R version used.)

Only partial details are provided, making replication a challenging task (A model shared on GitHub without renv integration, making it uncertain which R packages and versions were used during development.)

Most details are shared, allowing decent reproducibility with some variations (A model shared with training data and a `testthat` suite for unit testing, but lacking details on hyperparameters and without a Dockerfile to ensure OS-level reproducibility.)

Almost all necessary details are provided, ensuring high reproducibility (A model with clear documentation, training data, integrated with `targets` for pipeline management, and `renv` for package versioning, but not consistently tested across multiple OSs.)

Every detail is shared, ensuring that the model is consistently reproducible (An open-source model hosted on GitHub with complete documentation, data, parameter details, `testthat` for automated testing, `renv` for R package versioning, a specified R version, and a Docker container ensuring consistent environment across all OSs.)

Not evaluated (-)

Fairness

The model consistently discriminates (A pediatric treatment recommendation model seems to consistently favor treatments that are more commonly prescribed to boys, overlooking effective treatments for girls.)

The model often shows bias, with limited measures of fairness (A cardiac risk assessment tool, despite attempts at calibration, sometimes gives misleading risk scores to certain ethnicities.)

Equity is considered, but improvements can be made (A nutrition recommendation system, designed with diversity in mind, occasionally suggests diets that may not align with certain cultural or regional preferences.)

The model is mostly fair, with only rare instances of bias (An epidemiological tool to predict flu risk factors is largely unbiased but might occasionally overlook certain minor ethnic or regional disparities.)

The model is designed to be completely fair, treating all groups equally (A clinical diagnosis tool for thoracic conditions, after rigorous testing and iteration, shows no discernible bias across age, gender, or ethnicity.)

Not evaluated (-)

Accountability

There is a complete lack of accountability mechanisms (The model was implemented without any form of supervision or review.)

Few accountability measures exist, leaving many gaps (A model with occasional reviews but no structured accountability framework.)

There are discrete accountability measures, but they are not comprehensive (A model with regular reviews and some oversight mechanisms.)

Accountability is taken seriously, though not infallible (A model with clear oversight, regular reviews, but occasional lapses.)

Robust mechanisms ensure full accountability (A model with strong oversight, continuous audits, and feedback loops.)

Not evaluated (-)

Stakeholder inclusiveness

Stakeholders are completely excluded from the development of the model (A cardiac risk prediction model developed without input from cardiologists or patients.)

Few stakeholders are consulted, leading to limited inclusiveness (A pediatric nutrition model formulated with feedback only from nutritionists, without considering pediatricians or parents.)

There is a fair level of stakeholder involvement, but it is not comprehensive (An epidemiological tool designed with input from public health experts and some local communities, but not all affected regions.)

Stakeholder inclusiveness is high, with only small gaps (A thoracic treatment recommendation system developed after consultations with thoracic surgeons, radiologists, and a few patient groups, but missing input from nursing staff.)

All key stakeholders have been consulted, ensuring full inclusiveness (A vascular health prediction tool developed after comprehensive feedback sessions involving vascular surgeons, general practitioners, patients, and nursing staff.)

Not evaluated (-)

Financial Impact

Mistakes lead to severe financial losses (A pediatric drug dosage model that, due to miscalculations, leads to expensive lawsuits from adverse events.)

Financial risk is high due to errors, with few safeguards (A cardiac surgery equipment procurement model that results in frequent over-purchasing of expensive devices not used.)

Errors cause moderate financial impact, but there are preventive measures (An epidemiological intervention budgeting tool that sometimes underestimates funds needed for a region, requiring additional emergency funds.)

Financial risks from errors are low due to preventive mechanisms (A nutrition program's budgeting tool for schools that occasionally overestimates food quantities, but has mechanisms to redirect surplus to other needs.)

Financial repercussions from errors are almost nonexistent, thanks to robust safeguard mechanisms (A thoracic surgery costing model with advanced error-checking, ensuring that hospital billing for procedures is accurate and fair.)

Not evaluated (-)