rdd.py 文件源码

python
阅读 33 收藏 0 点赞 0 评论 0

项目:MIT-Thesis 作者: alec-heif 项目源码 文件源码
def fullOuterJoin(self, other, numPartitions=None):
        """
        Perform a right outer join of C{self} and C{other}.

        For each element (k, v) in C{self}, the resulting RDD will either
        contain all pairs (k, (v, w)) for w in C{other}, or the pair
        (k, (v, None)) if no elements in C{other} have key k.

        Similarly, for each element (k, w) in C{other}, the resulting RDD will
        either contain all pairs (k, (v, w)) for v in C{self}, or the pair
        (k, (None, w)) if no elements in C{self} have key k.

        Hash-partitions the resulting RDD into the given number of partitions.

        >>> x = sc.parallelize([("a", 1), ("b", 4)])
        >>> y = sc.parallelize([("a", 2), ("c", 8)])
        >>> sorted(x.fullOuterJoin(y).collect())
        [('a', (1, 2)), ('b', (4, None)), ('c', (None, 8))]
        """
        return python_full_outer_join(self, other, numPartitions)

    # TODO: add option to control map-side combining
    # portable_hash is used as default, because builtin hash of None is different
    # cross machines.
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号