TensorFlow实现卷积神经网络

卷积神经网络

卷积神经网络(Convolutional Neural Network,CNN)当初是为了解决图像识别等问题设计的，但是现在的应用不止在图像方面，也可用于视频、音频、文本等。

下图是一个简单的卷积神经网络的示意图

一般的卷积神经网络由多个卷积层构成，每个卷积层通常会解析如下几个操作。

图像通过多个不同的卷积核的滤波，并加偏置，提取出局部特征，每个卷积核会映射出一个新的2D图像
将前面卷积核的滤波输出结果，解析非线性的激活函数处理。目前最常见的是使用ReLU。
对激活函数的结果在进行池化操作，，目前一般使用的是最大池化，保留最显著的特征，并提升模型的鲁棒性

这几个步骤就构成了最常见的卷积层，目前比较通用的还会加入一层批量归一化(Batch Normalization)等。在实践中，使用了批量归一化的网络对于不好的初始值有更强的鲁棒性。

总的来说，卷积神经网络的要点就是局部连接(local connection)、权值共享(weights sharing)、池化层(pooling)的降采样(down-sampling)，其中局部连接和权值共享降低了参数量，使训练复杂度大大下降，减轻了过拟合。权值共享赋予了卷积神经网络的平移的容忍性，提高了模型的泛化能力。

TensorFlow实现一个简单的卷积神经网络

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# 这里仍然根据路径选择MNIST数据集
mnist = input_data.read_data_sets(
    "/Users/wuyong/Desktop/网课的作业笔记/pythondemo/TensorFlowdemo/mnist/data",
    one_hot=True)
sess = tf.InteractiveSession()


# 定义一个初始化函数方便重复使用。
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


# 需要个权重制造一些随机的噪声来打破完全对称，比如截断的正态分布噪声
# 标准差为0.1，同时使用ReLU，也给偏置增加一些小的正直来避免死亡节点
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


# 卷积层、池化层都是重复使用的，这里的tf.nn.conv2d是TensorFlow中的
# 二维卷积函数，参数中x是输入，W是卷积的参数，比如[5,5,1,32]前面的
# 5，5表示卷积核的尺寸，第三个数字代表有多少个channel。
# 因为MNIST只有灰度单色，所以为1，如果是彩色RGB图像，则这里应该是3
# 最后一个表示卷积核的数量，也就是这个卷积层会提取多少类的特征
# strides表示卷积模板移动的步长，都是1表示会不遗漏地划过图片的每一个点。
# padding表示便捷的处理方式，这里SAME表示给边界加速padding让卷积的输入和输出保持同样的尺寸
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


# tf.nn.max_pool是TensorFlow中的最大护花函数，这里使用2x2的最大池化，即将一个2x2的
# 像素快降维1x1的像素，最大池化会保留原始像素块中灰度值最高的哪一个像素，即保留最明显的特征
# 因为希望整体上缩小尺寸，因此池化层的strides也设为横竖两个方向以2位步长。如果步长还是1，
# 那么我们会得到一个尺寸不变的图片
def max_pool_2x2(x):
    return tf.nn.max_pool(
        x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')


# 定义输入的placeholder，x是特征，y_是真实的labels
# 因为卷积神经网络会利用到空间结构信息，所以需要将1D的输入向量转为2D的图片结构
# 即从1*784的形式转为原始的28*28的结构，同事因为只有一颜色通道，故最终尺寸为[-1,28,28,1]
# -1表示样本数量不固定，最后的1表示颜色通道数。
# 这里使用的tensor变形函数是tf.reshape
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
x_image = tf.reshape(x, [-1, 28, 28, 1])

# 定义第一个卷积层，使用前面写好的函数进行参数初始化。包括weights和bias
# 这里的[5,5,1,32]代表卷积核尺寸为5*5,一个颜色通道，32个不同的卷积核
# 然后使用conv2d函数进行卷积操作，并加上偏置，接着在使用ReLU激活函数进行非线性处理
# 最后，使用醉倒池化函数max_pool_2x2对卷积的输出结果进行池化操作
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# 定义第二个卷积层，不同的是卷积核的数量变成了64，也就是这一层的卷积会提取64种特征
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# 图片的尺寸变化了，第二个卷积核数量为64，输出的tensor尺寸即为7*7*64
# 使用tf.reshape对第二个卷积层的输出tensor进行变形，将其转成1D的向量
# 连接一个全连接层，隐含节点为1024，并使用ReLU
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# 减轻过拟合，使用dropout层 通过一个placeholder传入keep_prob来控制。
# 训练时则保留全部数据来追求最好的预测性能
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# 最后将dropout层的输出连接一个softmax层，得到最后的概率输出
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

# 我们定义损失函数cross_entropy，优化器使用adam 学习速率1e4
cross_entropy = tf.reduce_mean(-tf.reduce_sum(
    y_ * tf.log(y_conv), reduction_indices=[1]))

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

# 定义准确率
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# 开始训练过程，初始化所有参数，设置训练时dropout的keep_prob比率为0.5
# 然后使用大小为50的mini-batch，共进行20000次训练迭代，参与训练的样本数量总共为100万
# 其中每100次训练，我们会对准确率进行一次评测
tf.global_variables_initializer().run()
for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i % 100 == 0:
        train_accuracy = accuracy.eval(
            feed_dict={x: batch[0],
                       y_: batch[1],
                       keep_prob: 1.0})
        print("step %d, training accuracy %g" % (i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

# 全部训练完，在最终的测试集上进行全面的测试，得到整体的分类准确率
print("test accuracy %g" % accuracy.eval(
    feed_dict={x: mnist.test.images,
               y_: mnist.test.labels,
               keep_prob: 1.0}))

'''输出：
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
step 0, training accuracy 0.1
step 100, training accuracy 0.88
step 200, training accuracy 0.94
step 300, training accuracy 0.88
step 400, training accuracy 0.98
step 500, training accuracy 0.92
step 600, training accuracy 1
step 700, training accuracy 1
step 800, training accuracy 0.9
step 900, training accuracy 1
step 1000, training accuracy 0.96
step 1100, training accuracy 0.98
step 1200, training accuracy 1
step 1300, training accuracy 0.94
step 1400, training accuracy 0.98
step 1500, training accuracy 0.94
step 1600, training accuracy 0.98
step 1700, training accuracy 1
step 1800, training accuracy 0.96
step 1900, training accuracy 1
step 2000, training accuracy 0.98
。。。。。中间太多省略了
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
可以看到最后的结果基本上都为1。。。并不是完全正确
这是最后的结果在99.5%以上
'''

实现进阶的卷积网络

MNIST玩够了接下来玩点高级的

这里使用数据集为cifar-10

先下载TensorFlow的models库

git clone https://github.com/tensorflow/models.git

# 这个地址访问github下载会比较慢，我把移到了coding.net 这个比较快一点
git clone https://git.coding.net/Acks/tensorflow_Models.git

# 然后
cd models/tutorials/image/cifar10

# 导入库
import tensorflow as tf
import numpy as np
import time
import cifar10,cifar10_input

下载cifar10数据集

cifar10.maybe_download_and_extract()

'''
数据集比较大163MB，耐心等待。。
下载完成
>> Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
'''

下载完数据集之后就可以实现进阶的卷积神经网络了！

这里我简单解释一下思路

先载入数据，并且对数据进行数据增强，然后构建一个卷积神经网络，再测试

网络的结构为：conv1->pool1->norm1->conv2->norm2->pool2->fc3->fc4->logits

下面是完整代码及说明：

import tensorflow as tf
import numpy as np
import time
# import sys
# 这里是我下载的models路径 根据自己的路径更改
# sys.path.append('/root/tensorflowdemo/cifar10/data/tensorflow_Models/tutorials/image/cifar10')
import cifar10, cifar10_input

# 定义batch_size,训练轮数max_steps
max_steps = 1000  # 看自己机器的能力了，1000我服务器都跑了很久
batch_size = 128
data_dir = 'cifar_dataset/cifar10_data/cifar-10-batches-bin'


# 定义初始化weight函数 使用tf.truncated_normal截断的正态分布来初始化权重
# 给weight加一个L2的loss 相当于做了一个L2正则化处理
# L1正则会制造稀疏的特征。大部分无用特征的权重会被置为0
# L2正则会让特征的权重不过大，使得特征的权重比较平均
# 使用w1控制L2 loss 的大小，使用tf.nn.l2_loss计算weight的L2 loss
# 再用tf.multiply让L2 loss乘以w1，得到最后的weight loss
# 使用td.add_to_collection吧weight loss统一存到一个collection
def variable_with_weight_loss(shape, stddev, w1):
    var = tf.Variable(tf.truncated_normal(shape, stddev=stddev))
    if w1 is not None:
        weight_loss = tf.multiply(tf.nn.l2_loss(var), w1, name='weight_loss')
        tf.add_to_collection('losses', weight_loss)
    return var


# cifar10_input类中distorted_inputs函数产生训练需要使用的数据
# 包括特征以及对应的label，这里返回的是已经封装好的tensor 每次执行会删除一个batch_size的数量的样本
# 这里已经对数据进行了数据增强 具体实现在cifar10_input.distorted_inputs函数中
# 其中增强的操作包括随机的水平翻转(tf.image.random_flip_left_right)，
# 随机剪切一块24*24大小的图片(tf.image.random_crop)
# 设置随机的亮度和对比度(tf.image.random_brightness tf.image.random_contrast)
# 对数据进行标准化(tf.image.per_image_whitening 对数据进行减去均值除以方差保证数据0均值)
# 原来的一张图片可以变为多张图片，相当于扩大样本量
# 这里对数据增强的操作要耗费大量CPU资源，所以distorted_inputs使用了16个独立的线程来加速任务
# 函数内部会产生线程池，在需要使用时会通过TensorFlow queue进行调度
images_train, labels_train = cifar10_input.distorted_inputs(
    data_dir=data_dir, batch_size=batch_size)

# 使用cifar10_input.inputs函数输出测试数据，这里不需要对图片进行翻转修改亮度、对比度
# 不过要裁剪图片正中间的24*24大小的区块 并对数据标准化操作
images_test, labels_test = cifar10_input.inputs(
    eval_data=True, data_dir=data_dir, batch_size=batch_size)

# 创建输入数据的placeholder，包括特征和label
# 这里batch_size之后定义网络结构时候用到了，所以不能像之前一样可以设为None
# 数据图片尺寸是24*24 颜色通道为3
image_holder = tf.placeholder(tf.float32, [batch_size, 24, 24, 3])
label_holder = tf.placeholder(tf.int32, [batch_size])

# 使用之前创建的variable_with_weight_loss函数对卷积核的参数进行创建和初始化
# 第一个卷积核大小5*5 3通道 64卷积核 设置初始化函数的标准差为0.05
# 不对第一个卷积的weight进行L2正则，所以w1=0.0
# 然后使用tf.nn.conv2d函数对数据image_holder进行卷积
# 这里步长为1 padding为SMAE bias初始化为0 再加上bias 最后使用一个ReLU函数进行非线性化
# 之后使用一个尺寸为3*3 步长为2*2的最大池化层处理 这里尺寸和步长不一样可以增加数据的丰富性
# 之后使用tf.nn.lrn函数对LRN结果进行处理
# LRN最早是Alex用CNN参加Imagenet比赛的论文，LRN层模仿了生物神经系统的“侧抑制”机制
# 对局部神经元的活动创建竞争环境，使得其中响应比较大的值变得相对强大，并抑制其他反馈较小的神经元
# 增强了模型的泛化能力 在ImageNet的实验表明 使用LRN后CNN在Top1的错误率可以降低1.4%
# LRN层对于ReLU这种没有上限边界的激活函数会比较有用，因为它会从附近的多个卷积核的响应中挑选比较大的反馈
# 但不适合sigmoid这种有固定边界并且能抑制过大值的激活函数
weight1 = variable_with_weight_loss(shape=[5, 5, 3, 64], stddev=5e-2, w1=0.0)
kernel1 = tf.nn.conv2d(image_holder, weight1, [1, 1, 1, 1], padding='SAME')
bias1 = tf.Variable(tf.constant(0.0, shape=[64]))
conv1 = tf.nn.relu(tf.nn.bias_add(kernel1, bias1))
pool1 = tf.nn.max_pool(
    conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')
norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)

# 这里先LRN再池化
weight2 = variable_with_weight_loss(shape=[5, 5, 64, 64], stddev=5e-2, w1=0.0)
kernel2 = tf.nn.conv2d(norm1, weight2, [1, 1, 1, 1], padding='SAME')
bias2 = tf.Variable(tf.constant(0.1, shape=[64]))
conv2 = tf.nn.relu(tf.nn.bias_add(kernel2, bias2))
norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
pool2 = tf.nn.max_pool(
    conv2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')

# 使用一个全连接层
# 这里用tf.reshape函数将每一个样本都变为一维向量。我们使用get_shape哈数
# 获取数据全部扁平化之后的长度 接着是哦你variable_with_weight_loss函数对全连接层的weight进行初始化
# 这里隐含节点为384 标准差为0.04 bias值初始化为0.1
# 因为不希望全连接层不过拟合，因此设了一个非零的weight loss值0.4
# 让这一层的所有参数都被L2正则所约束 最后使用ReLU进行非线性化
reshape = tf.reshape(pool2, [batch_size, -1])
dim = reshape.get_shape()[1].value
weight3 = variable_with_weight_loss(shape=[dim, 384], stddev=0.04, w1=0.004)
bias3 = tf.Variable(tf.constant(0.1, shape=[384]))
local3 = tf.nn.relu(tf.matmul(reshape, weight3) + bias3)

# 全连接层
# 隐含节点数量下降一半 其他超参数不变
weight4 = variable_with_weight_loss(shape=[384, 192], stddev=0.04, w1=0.004)
bias4 = tf.Variable(tf.constant(0.1, shape=[192]))
local4 = tf.nn.relu(tf.matmul(local3, weight4) + bias4)

# 最后一层
# 先创建这一层的weight 其正态分布标准差为上一个隐含层节点数的倒数
# 并且不计入L2正则 这里不像之前使用softmax输出最后的结果 因为我们把softmax的操作放在了计算loss的部分
# 我们不需要对inference的输出进行softmax处理就可以获得最后的分类结果
# 计算softmax主要是为了计算loss 因此softmax操作整合到后面是比较合适的
weight5 = variable_with_weight_loss(shape=[192, 10], stddev=1 / 192, w1=0.0)
bias5 = tf.Variable(tf.constant(0.0, shape=[10]))
logits = tf.add(tf.matmul(local4, weight5), bias5)


# 创建CNN的loss 这里把softmax的计算和cross entropy loss 计算合在了一起
# 即tf.nn.sparse_softmax_cross_entropy_with_logits
# 这里使用tf.reduce_mean对cross entropy计算均值 再使用tf.add_to_collection
# 把cross entropy的loss添加到整体losses的collection
# 最后使用tf.add_n将整体的losses的collection求和 得到最终的loss
# 包括cross entropy loss 还有后两个圈连接层中weight的L2 loss
def loss(logits, labels):
    labels = tf.cast(labels, tf.int64)
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        logits=logits, labels=labels, name='cross_entropy_per_example')
    cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
    tf.add_to_collection('losses', cross_entropy_mean)
    return tf.add_n(tf.get_collection('losses'), name='total_loss')


# 将logits节点和label_placeholder传入loss函数获得最终的loss
loss = loss(logits, label_holder)
train_op = tf.train.AdamOptimizer(1e-3).minimize(loss)

# 使用top_k_op函数求输出结果中top k的准确率 默认Top1 也就是输出分数最高的那一类准确率
top_k_op = tf.nn.in_top_k(logits, label_holder, 1)

# 初始化全部模型参数
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

# 启动前面提到的图片数据增强的线程队列 这里一共使用了16个线程来进行加速
# 注意，如果这里不启动线程 那么后续的inference及训练的操作都是无法开始的
tf.train.start_queue_runners()

# 正式开始训练
for step in range(max_steps):
    # 记录开始的时间 记录每一个step花费的时间
    start_time = time.time()
    iamge_batch, label_batch = sess.run([images_train, labels_train])
    _, loss_value = sess.run(
        [train_op, loss],
        feed_dict={image_holder: iamge_batch,
                   label_holder: label_batch})
    duration = time.time() - start_time
    # 每隔10个step会计算展示当前的loss 每秒训练的样本数量 以及一个batch花费的时间
    if step % 10 == 0:
        examples_per_sec = batch_size / duration
        sec_per_batch = float(duration)

        format_str = ('step %d,loss=%.2f (%.1f examples/sec; %.3f sec/batch)')
        print(format_str % (step, loss_value, examples_per_sec, sec_per_batch))

# 测试集
num_examples = 10000  # 10000个测试样本
import math
num_iter = int(math.ceil(num_examples / batch_size))
true_count = 0
total_sample_count = num_iter * batch_size
step = 0
while step < num_iter:
    iamge_batch, label_batch = sess.run([images_test, labels_test])
    # 计算模型在这个batch的top1 上预测正确的样本数
    predictions = sess.run(
        [top_k_op],
        feed_dict={image_holder: images_batch,
                   label_holder: labels_batch})
    true_count += np.sum(predictions)
    step += 1

# 输出准确率
precision = true_count / total_sample_count
print('precision @ 1 = %.3f % precision')

'''
这里不想浪费本机资源
我在没有GPU的服务器(阿里云学生-。-)上面跑的 
速度比较慢
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
step 0,loss=4.68 (9.4 examples/sec; 13.604 sec/batch)
step 10,loss=3.75 (81.8 examples/sec; 1.565 sec/batch)
step 20,loss=3.13 (83.9 examples/sec; 1.527 sec/batch)
step 30,loss=2.83 (81.4 examples/sec; 1.573 sec/batch)
step 40,loss=2.63 (81.4 examples/sec; 1.573 sec/batch)
step 50,loss=2.48 (81.2 examples/sec; 1.577 sec/batch)
step 60,loss=2.17 (81.4 examples/sec; 1.573 sec/batch)
step 70,loss=2.22 (81.5 examples/sec; 1.570 sec/batch)
step 80,loss=2.00 (82.8 examples/sec; 1.545 sec/batch)
step 90,loss=2.03 (83.1 examples/sec; 1.541 sec/batch)
中间太多省略 在3000次左右loss在1左右
step 2900,loss=1.22 (69.8 examples/sec; 1.835 sec/batch)
step 2910,loss=1.07 (69.6 examples/sec; 1.838 sec/batch)
step 2920,loss=1.06 (70.2 examples/sec; 1.824 sec/batch)
step 2930,loss=1.23 (69.3 examples/sec; 1.847 sec/batch)
step 2940,loss=0.97 (68.7 examples/sec; 1.863 sec/batch)
step 2950,loss=0.99 (69.4 examples/sec; 1.845 sec/batch)
step 2960,loss=1.11 (69.7 examples/sec; 1.836 sec/batch)
step 2970,loss=1.04 (67.8 examples/sec; 1.887 sec/batch)
step 2980,loss=1.15 (69.7 examples/sec; 1.837 sec/batch)
step 2990,loss=0.82 (69.2 examples/sec; 1.849 sec/batch)
'''
'''
后面测试集的结果我用的是跑1000代的模型 - -！
不小心把后面image拼错了3000代的结果没保存 心态崩了
后面会说到保存模型和Tensor Board。。。。
所以就跑了1000重新来看测试的结果的


'''