Topics: On the Opportunities and Risks of Foundation Models — Evaluation
Session Lead: Ming Jiang
Time: 1 pm – 2 pm, Tuesday, 2021-09-07
Why this paper?
Gathering the experience of over 100 AI research scholars from different backgrounds, this survey report provides a comprehensive overview of the opportunities and risks of foundation models (i.e., “any model that is trained on broad data at scale and can be adapted (e.g., fine-tuned) to a wide range of downstream tasks”). Rather than singly emphasizing the technical properties of such fashionable AI products, by taking the ecosystem of foundation models in society into account, this report hopes to “highlight all the interdisciplinary connections” of foundation models to improve the understanding of their sociotechnical nature and provide suggestions for future research.
Given that the complete report has over 200 pages, it would be difficult to have a deep discussion on all aspects of foundation models within an hour. Therefore, in the coming reading group, I would like to focus on the model evaluation for discussion. The main reason for selecting this part is because the process of evaluation is critical to understand how these fancy models work, and getting to know existing evaluation methods and their limitations is helpful with us for being aware of any uncertainties in the perception of foundation models’ properties.
- In addition to the three aspects (tracking progress, understanding, documentation) of the role of evaluation summarized in this report, are there any other benefits brought by the process of evaluation?
- Given a wide range of applications of foundation models in both academic and industry, like what the paper mentioned, there are a variety of stakeholders such as model developers and practitioners, auditors and policymakers. Different stakeholders may have their own concerns with foundation models. What aspect or property of foundation models would you suggest for consideration in the model evaluation?
- Following Q2, as we know there are two main types of evaluation (either intrinsic or extrinsic), which type of evaluation would you prefer to use to investigate the model property that you are interested in？ Why？
- In the paper, the authors pointed out an open question: how should intrinsic evaluation be implemented? Any possible ideas regarding this issue?
- When we talk about learning-based techniques, one key factor influencing model performance is always the data. However, existing evaluations are generally model-centric. Should we examine the quality of data and its influence used to pre-train, fine-tune or evaluate model performance? If the answer is yes, are there any insights for how to do the data evaluation?