实现第一个神经网络

实现一个简单的神经网络

前向传播算法

假设存在如下图所示的三层神经网络:

前向传播可以表示成矩阵乘法。将输入$x_1$、$x_2$组织成一个$1\times2$的矩阵,而$W^{(1)}$组织成一个$2\times3$的矩阵:

$$
W^{(1)}=\begin{bmatrix}
W^{(1)}{1,1}&W^{(1)}{1,2}&W^{(1)}{1,3}\
W^{(1)}{2,1}&W^{(1)}{2,2}&W^{(1)}{2,3}\
\end{bmatrix}
$$

这样通过矩阵乘法可以得到隐藏层三个节点所组成的向量取值:

[这里之前出了点公式的问题，原因是 mathjax 解析中括号改了。。]

$$
a^{(1)}=[a_{11},a_{12},a_{13}]=xW^{(1)}=[x1,x2]\begin{bmatrix}
W^{(1)}{1,1}&W^{(1)}{1,2}&W^{(1)}{1,3}\
W^{(1)}{2,1}&W^{(1)}{2,2}&W^{(1)}{2,3}\
\end{bmatrix}\
=[W^{(1)}{1,1}x_1+W^{(1)}{2,1}x_2+W^{(1)}{1,2}x_1+W^{(1)}{2,2}x_2+W^{(1)}{1,3}x_1+W^{(1)}{2,3}x_2]
$$

类似的输出层可以表示为:

$$
y=a^{(1)}W^{(2)}=[a_{11},a_{12},a_{13}]\begin{bmatrix}W^{(2)}{1,1}\W^{(2)}{2,1}\
W^{(2)}{3,1}\
\end{bmatrix}=[W^{(2)}{1,1}a_{11}+W^{(2)}{2,1}a{12}+W^{(2)}{3,1}a{13}]
$$

这样就将前向传播的算法通过矩阵的乘法表示出来了。在TensorFlow中提供了一个函数直接可以进行这样的矩阵计算。

1 2	a = tf.matmul(x, w1) b = tf.matmul(a, w2)

神经网络的参数

下面给出在TensorFlow中声明了一个$2*3$的矩阵变量的方法:

weights = tf.Variable(tf.random_normal([2, 3], stddev=2))
# tf.random_normal([2, 3], stddev=2)函数是产生一个均值为0,方差为2的随机数,
# 其中可以通过添加mean参数指定平均值,没有指定时默认为0。

# 随机变量中可以通过seed参数设定随机数种子,可以保证每次运行结果相同:
# tf.random_normal([2, 3], stddev=2, seed=1)

tf.random_normal 正态分布,参数为:平均值、标准差、取值类型

官方API:tf.random_normal

tf.truncated_normal 正态分布,但如果随机出来的值偏离平均值超过两2个标准差,重新随机

官方API:tf.truncated_normal

tf.random_uniform 平均分布,参数为:最小、最大取值、取值类型

官方API:tf.random_uniform

tf.random_gamma Gamma分布,参数为:形状参数alpha、尺度参数beta、取值类型

官方API:tf.random_gamma

NN中通常会使用常数初始化bias

1	biases = tf.Variable(tf.zeros([3]))

TensorFlow中也支持使用其他变量初始值来初始化新变量:

1 2	w2 = tf.Variable(weights.initialized_value()) w3 = tf.Variable(weights.initialized_value() * 2)

将前向传播过程写成代码:

import tensorflow as tf
# 声明weights1 weights2
weights1 = tf.Variable(tf.truncated_normal([2, 3], stddev=2, seed=1))
weights2 = tf.Variable(tf.truncated_normal([3, 1], stddev=2, seed=1))
# 假设输如向量为一个1*2的常量:
x = tf.constant([0.7, 0.9], shape=[1, 2])
# 前向传播算法
a = tf.matmul(x, weights1)
y = tf.matmul(a, weights2)

sess = tf.Session()
# 不能直接求y,要先初始化参数
sess.run(weights1.initializer)
sess.run(weights2.initializer)
print(sess.run(y))
sess.close()

'''
输出
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
[[ 11.53417969]]
'''

如果存在很多变量,再对每个变量初始化会导致很麻烦,TensorFlow提供了一种便捷的方式初始化:

1
2
3

init_w = global_variables_initializer()
sess.run(init_w)
# init_w = tf.initialize_all_variables()这个是1.0之前的API

简单的监督学习

使用监督学习的方式设置神经网络参数需要有标注好的训练数据集。

之前所有的变量的取值都是随机的,但是在实际过程中,可能需要更好的设置参数的值,这时就需要一种优化算法。而神经网络优化算法中最常用的方法是反向传播算法。

反向传播算法实现了一个迭代的过程,在每次迭代的开始,首先需要选取一小部分训练数据,这部分的叫做batch。然后,这个batch的样例会通过前向传播算法得到预测结果,因为训练数据都有标注,所以计算出当前计算的值和真实值之间的差距,反向传播算法会更新神经网络参数的取值,使得在这个batch上神经网络模型的预测结果和真实答案更接近。

这里要注意,如果每轮迭代中选取的数据都要通过常量来表示,那么TensorFlow的计算图将会太大。因为每生成一个常量,TensorFlow将会在计算图中增加一个节点。一般来说一个神经网络的训练要经过非常多次的迭代,这样就会产生非常大的计算图,利用率很低。为了避免这个问题,TensorFlow提供了placeholder机制用于提供输入数据,而只需要将数据通过placeholder传入计算图。

import tensorflow as tf

w1 = tf.Variable(tf.random_normal([2, 3], stddev=1))
w2 = tf.Variable(tf.random_normal([3, 1], stddev=1))

x = tf.placeholder(tf.float32, shape=(1, 2), name="input")
a = tf.matmul(x, w1)
y = tf.matmul(a, w2)
sess = tf.Session()
init_w = tf.global_variables_initializer()
sess.run(init_w)
# 这行会出错
print(sess.run(y))
# 输出
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'input' with dtype float and shape [1,2]
	 [[Node: input = Placeholder[dtype=DT_FLOAT, shape=[1,2], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

# 这样就会正常输出
print(sess.run(y, feed_dict={x: [[0.7, 0.9]]}))
# 输出
[[-1.25927997]]

'''
这里替换了原来通过常量定义的输入x。在新的程序中计算前向传播结果时,
需要提供一个feed_dict来指定x的取值。feed_dict是一个字典,
在字典中需要给出每个placeholder的值。如果没有某个需要的placeholder的取值,
那么程序在运行时会报错
'''

如果需要一次性计算多个样例前向传播结果。

1 2	x = tf.placeholder(tf.float32, shape=(3, 2), name="input") print(sess.run(y, feed_dict={x: [[0.7, 0.9],[0.1,0.4],[0.5,0.8]}))

利用损失函数反向传播调整参数
之后详细介绍

# 损失函数
cross_entropy = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))
# 学习率
learning_rate = 0.001
# 反向传播算法
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)

完整神经网络样例

这是一个完整程序训练二分类问题

import tensorflow as tf
from numpy.random import RandomState
# 训练数据batch大小
batch_size = 8
# 定义神经网络的参数
weights1 = tf.Variable(tf.truncated_normal([2, 3], stddev=2, seed=1))
weights2 = tf.Variable(tf.truncated_normal([3, 1], stddev=2, seed=1))
# 这里None是方便改变batch大小,训练时把数据分成较小的batch,测试使用全部数据
# 当数据集较小时这样容易测试,但数据集较大时,将大量数据放入一个batch
x = tf.placeholder(tf.float32, shape=(None, 2), name="x-input")
y_ = tf.placeholder(tf.float32, shape=(None, 1), name="y-input")

a = tf.matmul(x, weights1)
y = tf.matmul(a, weights2)

cross_entropy = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))
learning_rate = 0.001
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)

# 随机模拟数据集
rdm = RandomState(1)
dataset_size = 128
X = rdm.rand(dataset_size, 2)
# 这里用x1+x2<1表示为正样本(1),其他为负样本(0)
Y = [[int(x1 + x2 < 1)] for (x1, x2) in X]

with tf.Session() as sess:
    init_w = tf.global_variables_initializer()
    sess.run(init_w)
    print(sess.run(weights1))
    print(sess.run(weights2))

    # 训练轮数
    step = 5000
    for i in range(step):
        # 每次选取batch_size个样本进行训练
        start = (i * batch_size) % dataset_size
        end = min(start + batch_size, dataset_size)
        # 对选取的样本训练神经网络并更新参数
        sess.run(train_step, feed_dict={x: X[start:end], y_: Y[start:end]})
        # 每隔1000次 将输出所有数据的交叉熵。
        if i % 1000 == 0:
            total_cross_entropy = sess.run(cross_entropy,
                                           feed_dict={x: X,
                                                      y_: Y})
            print("After %d trainning steps,cross entropy on all data is %g" %
                  (i, total_cross_entropy))

    # 在训练之后输出神经网络的值
    print(sess.run(weights1))
    print(sess.run(weights2))


'''
输出
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
[[-1.62263644  2.96919751  0.13065873]
 [ 0.1984968   1.27939415  3.22174239]]
[[-1.62263644]
 [ 2.96919751]
 [ 0.13065873]]
After 0 trainning steps,cross entropy on all data is 0.00922738
After 1000 trainning steps,cross entropy on all data is 0.00651872
After 2000 trainning steps,cross entropy on all data is 0.0046435
After 3000 trainning steps,cross entropy on all data is 0.00296163
After 4000 trainning steps,cross entropy on all data is 0.00143799
可以看出这里交叉熵越来越小，表示离真实值越来越接近

[[-2.77095127  4.07181215  1.52858329]
 [-0.9112885   2.34183121  4.59349394]]
[[-2.86135674]
 [ 4.07957268]
 [ 1.23969376]]
 
'''