def add_timing_signal(x, min_timescale=1, max_timescale=1e4, num_timescales=16):
"""Adds a bunch of sinusoids of different frequencies to a Tensor.
This allows attention to learn to use absolute and relative positions.
The timing signal should be added to some precursor of both the source
and the target of the attention.
The use of relative position is possible because sin(x+y) and cos(x+y) can be
experessed in terms of y, sin(x) and cos(x).
In particular, we use a geometric sequence of timescales starting with
min_timescale and ending with max_timescale. For each timescale, we
generate the two sinusoidal signals sin(timestep/timescale) and
cos(timestep/timescale). All of these sinusoids are concatenated in
the depth dimension, padded with zeros to be the same depth as the input,
and added into input.
Args:
x: a Tensor with shape [?, length, ?, depth]
min_timescale: a float
max_timescale: a float
num_timescales: an int <= depth/2
Returns:
a Tensor the same shape as x.
"""
length = shape_list(x)[1]
depth = shape_list(x)[3]
signal = get_timing_signal(length, min_timescale, max_timescale,
num_timescales)
padded_signal = tf.pad(signal, [[0, 0], [0, depth - 2 * num_timescales]])
return x + tf.reshape(padded_signal, [1, length, 1, depth])
评论列表
文章目录