pyspark.RDD.distinct# RDD.distinct(numPartitions=None)[source]# Return a new RDD containing the distinct elements in this RDD. New in version 0.7.0. Parameters numPartitionsint, optionalthe number of partitions in new RDD Returns RDDa new RDD containing the distinct elements See also RDD.countApproxDistinct() Examples >>> sorted(sc.parallelize([1, 1, 2, 3]).distinct().collect()) [1, 2, 3]