Houjun Liu

Training Data Sourcing

Finding training data for AI is hard. So instead:

Intentional training data

  • curated for training data
  • Spent time thinking about bias, control, etc.

Training set of convenience

  • Dataset that just comes about
  • Problematic:

Accidentally introduce bias into the data: Googling images of CEOs, which is convenient, results in all white males for a bit.