Chatbot

Two main Dialogue Systems architectures:

frame based systems: talk to users + accomplish specific tasks
LLM: reasoning as agents

Dialogue Systems vs Chatbot

Previously, when we say Chatbot we mean task-based systems

humans and chat

humans tend to think of Dialogue Systems as human-like even if they know its not. this makes users more prone to share private information and worry less about its disclosure.

ELIZA

see ELIZA

LLM Chatbots

Training Corpus

C4: colossal clean crawled corpus

patent, wikipedia, news

Chatbots

EmphaticDialogues
SaFeRDialogues
Pseudo-conversations: reddit, twitter, weibo

Fine-Tuning

quality: improving sensible and interesting responses
safety: prevention of suggesting harmful actions

IFT: perhaps you can add positive data as fine tuning as a part of instruction-finetuning step.

Filtering: build a filter for whether something is safe/unsafe, etc.

Retrieval Augmented Generation

call search engine
get back a retrieved passages
shove them into prompt
“based on this tasks, answer:”

we can make Chatbots use RAG by adding “pseudo-participants” to make the chat bots, which the system should add.

Evaluation

task based systems: measure task performance
chatbot: enjoyability by humans

we evaluate chatbots by asking a human to assign a score, and observer is a third party that assigns a score via a transcript of a conversation.

participants scoring

interact with 6 turns, then score:

avoiding repetition
interestingness
sensemaking
fluency
listening
inquisitiveness
humanness
engagingness

ACUTE-EVAL: choosing who you would like to speak to

adversarial evaluation

train a human/robot classifier, use it, use the inverse of its score at the metric of the chat bot

task evaluatino

measure overall task success, or measure slot error rate

design system design

Don’t build Frankenstein: safety (ensure people aren’t crashing cars), limiting representation harm (don’t demean social groups), privacy

study users and task

what are their values? how do they interact?

build simulations

wizard of oz study: observe user interaction with a HUMAN pretending to be a chat bot

test the design

test on users

info leakage

accidentally leaking information (microphone, etc.)
intentionally leaking information due to advertising, etc.