Houjun Liu

sentence segmentation

Common algorithm

  1. tokenization
  2. classier for whether a period token is a part of a word or the sentence boundary