Python

GPU上的Tensorflow Matmul计算比CPU慢

发布于 2021-01-29 16:47:35

我第一次尝试使用GPU计算，当然希望能够大幅度提高速度。但是，使用tensorflow中的一个基本示例实际上更糟：

在cpu：0上，十次运行的平均时间为2秒，gpu：0耗时2.7秒，而gpu：1比cpu：0差3秒，即降低50％。

这是代码：

import tensorflow as tf
import numpy as np
import time
import random

for _ in range(10):
    with tf.Session() as sess:
        start = time.time()
        with tf.device('/gpu:0'): # swap for 'cpu:0' or whatever
            a = tf.constant([random.random() for _ in xrange(1000 *1000)], shape=[1000, 1000], name='a')
            b = tf.constant([random.random() for _ in xrange(1000 *1000)], shape=[1000, 1000], name='b')
            c = tf.matmul(a, b)
            d = tf.matmul(a, c)
            e = tf.matmul(a, d)
            f = tf.matmul(a, e)
            for _ in range(1000):
                sess.run(f)
        end = time.time()
        print(end - start)

我在这里观察什么？运行时间是否主要由在RAM和GPU之间复制数据主导？

关注者

被浏览

1 个回答

面试哥 2021-01-29

为面试而生，有面试问题，就找面试哥。
用于生成数据的方式是在CPU上执行的（random.random()是常规的python函数，而不是TF-
one）。而且，执行10^6一次时间比10^6一次运行中请求随机数要慢。将代码更改为：
```
a = tf.random_uniform([1000, 1000], name='a')
b = tf.random_uniform([1000, 1000], name='b')
```
因此，数据将在GPU上并行生成，而不会浪费时间将其从RAM传输到GPU。

知识点

Python

面圈网VIP题库全新上线，海量真题题库资源。 90大类考试，超10万份考试真题开放下载啦

去下载看看