Houjun Liu

MLib

MLib is a machine learning library built on top of Spark.

from pyspalk.mllib.clustering import KMeans

KMeans(rdd)

where you pass the MLib a PySpark RDD