loc - みる会図書館

def output—layers(prev, outputs for i in range(6): [ 2 , 2 ] , [ 1 , 1 ] , training=False) : prev tf. layers. conv2d(prev, 256 , [ 3 , 3 ] , loc—output pred—output padding='SAME' activation=tf. nn. relu, use bias=True, trainable=training, name='conv_out%d' % tf ユ ayers . conv2d(prev, 4 , [ 1 , 1 ] , padding='VALID' activation=tf. nn . tanh, use bias=True, [ 1 , 1 ] , name='pred—output96d ' trainable=trainnng, use_bias=True, activation=tf. nn. sigmoid, padding='VAL 工 D' tf. layers. conv2d(prev, NUM_CLASSES, [ 1 , 1 ] , name='Ioc_output96d ' % i) trainable=traimng, output tf. concat( [toc_output, pred_output] , axis=3, name='output96d' % i) print(output ・ get—shape()) outputs . append(output) return outputs 3.4 テータセットの再作成変更したモデルに合わせて、データセットを再作成する必要があります。 62 第 3 章物体認識奮闘記

2. TensorFlowはじめました 3

lower=@ . 2) return distorted image 関数 load_ssd-dataset は、 TFRecord 形式のファイルを読み込んだあと、その後の処理で取り扱い易いように Tens 。 r の形を変換 (tf. reshape) します。また、画像のリサイズ、グレースケールへの変換、標準化 (standardization) など必要な処理をします。誤差関数の定義誤差関数 (Loss Function) は、入力をモデルの出力が期待する出力 ( 学習データ ) とどの程度離れているかを算出します。誤差関数には、 MSE (Mean Squared Error: 平均ニ乗誤差 ) を使います ( リスト 2.11) 。リスト 2.11 : train_ssd. py def _loss(logits, box_labets , boxes) : logits—label logits[: logits—loc logits[: : 4 ] tf. squared—difference(logits—labet, box—labels) tf. reduce—mean(labet—losses) tabet losses label loss tf. squared—difference(logits—loc, boxes) tf. reduce—mean(loc—tosses) loc_tosses 10C ー loss IOSS tabet_loss + loc_toss return IOSS モデルの出力 ( [ 21 , 21 , 5 ] ) は、座標と確信度を含んでいます。そこでまず、スライスを使って @ー 4 までを座標、 4 より上の要素を確信度として切り出します。次に、それぞれについて誤差を計算した後、 2 つを加算することで誤差を求めます。最適化アルゴリズムの設定最適化アルゴリズムは、誤差関数が算出した誤差をもとに、誤差が小さくなるようにモデルのパラメーターを更新します。関数 _init_optimizer は、 AdamOptimizer を最適化アルゴリズムに設定します ( リスト 2.12 ) 。第 2 章グリッドベースの物体検出 43

3. TensorFlowはじめました 3

tabel_losses tf. squared—difference(togits—tabel, box—labets) labet_loss calc—loss—with_hard—negative—mining( labet—losses, positive—labels, positive_count_in_batch) 10C losses 10C loc—loss(togits-loc, boxes) toc_losses ☆ positive—tabels 10C ー loss tf. reduce_sum(loc_losses / positive—count_in_batch) IOSS tabel_loss + 10C ー loss return IOSS 確信度の誤差 (label_losses) については Hard Negative Mining で誤差を求めています一方、座標の誤差 (loc_tosses) については positive_tabels を積算することで、 Positive な b 。 x に限定して誤差を求めています。これは筆者が、座標は確信度が高い場合にのみ必要であり、 Negative な box の領域 (box との差分は [@ , , , のとなります ) が正しく出力される必要性を感じなかったためです。 Positive : Negative 論文では、 Posit ⅳ e サンプル数と誤差に算入する Negative サンプル数の割合は 1 : 3 と記載されています。しかし、今回の実験では同条件では学習が安定せず、「ミニバッチ中に存在する P 。 sitive サンプル数の 3 倍」とすることで学習が進みました論文では「 VOC2 開 7 」など P 。 s ⅲ ve なサンプルの多いデータセットを使って実験しています。一方、筆者のデータセットでは 1 つの画像に Posit ⅳ e サンプルが 1-3 しかありません。このことから、論文記載の条件はデータセットに依存しており、利用するデータセットごとに条件を変える必要があると筆者は考えています。図 3.14 は、表 3.1 の条件で学習した場合の誤差のグラフです。表 3.1 : 学習条件条件学習率 (learning-rate) バッチサイズ (batch_size) ステップ数 (max_step) 値 0. 開 01 32 18 Ⅱ刈 68 第 3 章物体認識奮闘記

4. TensorFlowはじめました 3

図 3.14 : 誤差の変化 3 0.0 0 rnodell be 凵 055 20000 100000 0.012 0.010 0.008 0 0.002 0.004 0.006 model 1 loc b55 20000 第 2 章の時より大きな値は、 HardNegativeMining が効いているためと考えられます。 2 万ステップ以降はもみ合いになっている確信度誤差 (modell ユ abel ユ oss) とは異なり、座標の誤差 (modell-loc-toss) は、 10 万ステップ以降ももう少し下がりそうな様子です。学習結果を検証画像に適用した結果が、図 3.15 です。 3.6 検証第 3 章物体認識奮闘記 69

5. TensorFlowはじめました 3

print(output. get_shape()) return [output] モデルの出力出力となる [ 21 , 21 , 5 ] を、縦横 21 個 , 合計 441 個のセルで構成されるグリッドとして考え padding='VALID' activation=tf. nn. relu, [ 2 , 2 ] , [ 1 , 1 ] , グリッドベースの物体検出 name='conv3_2' ) trainable=training, use_bias=True, prev tf. layers. conv2d(prev, 256 , [ 3 , 3 ] , def output_layers(prev, training=FaIse) : return conv3 use bias=True, activation=tf. nn. relu, padding='SAME' loc—output pred—output trainable=training, name='conv_out') tf. tayers. conv2d(prev, 4 , [ 1 , 1 ] , padding='VALID' activation=tf. nn. tanh, use bias=True, [ 1 , 1 ] , name='pred—output' ) trainabte=training, use_bias=True, activation=tf. nn. sigmoid, padding='VAL 工 D' tf. tayers. conv2d(prev, NUM_CLASSES, [ 1 , 1 ] , name='loc_output') trainable=training, output tf. concat( [loc_output, pred_output] , axis=3, name='output' ) 第 2 章 25

6. TensorFlowはじめました 3

(Hard Negative な ) box を一定の割合で選択します。リスト 3.6 は、誤差に対して Hard Negative Mining を行う関数 calc—loss_with_hard—negative_mining ですリスト 3.6 : train_ssd. py NEGATIVE_COUNT RATE 3 def _calc_loss_with_hard_negative_mnning(losses, positive_mask, negative_count positive_count ☆ NEGATIVE_COUNT_RATE positive_count) : batch—size = tosses ・ get_shape() [@] . vatue positive_losses tosses ☆ positive_mask negative_losses losses positive losses negative_tosses tf. reshape(negative_tosses, top—negative_losses, tf. nn . top_k( negative_losses, k=tf. cast(negative—count, tf. int32)) [batch—size, (tf. reduce sum(positive—tosses / positive_count) IOSS け取り出すことができます。そうして得た negative_losses に tf. nn. top_k を使って、誤差次に、 tosses から、 Positive な誤差 positive ユ osses を引くと、今度は Negative な誤差だしています。これを losses に積算すると、 Positive な box 以外の誤差が 0 になります。 positive-mask は、各 box について顔領域が割り当てられているかを 1. @ または @. @ で示 return IOSS + tf. reduce_sum(top_negative_losses / negative_count) ) リスト 3.7 は、誤差を算出する関数ユ。 ss です。の大きなサンプルを取り出します。リスト 3.7 : train_ssd. py def _toss(logits, positive_labets, box_labels, boxes) : logits—labet logits[: logits—loc logits[: : 4 ] positive_count_in_batch tf. reduce—sum(positive—labels) positive_count_in_batch tf. reduce_max( [ 1 , positive—count_in—batch] ) 第 3 章物体認識奮闘記 67

7. TensorFlowはじめました 3

with tf. Session() as sess: print(sess . run(result)) [ 6 7 8 9 ] 1 訓練・学習済のモデルを使った認識 ( 評価 ) を前提としています 2. 「オペレーション」というと何らかの操作に限定すると思われがちですが、 Tenso 「日 ow では、定数や変数も「値を出力するオペレーション」と考えられています 3. ただし ConvolutionaI Neural Netwo 「 k で処理する場合は tf. 月 0at32 に変換する必要があります 4. https.//www tensorflow org/programmers_guide/dims types 5. https.//docs.scipy.org/doc/numpy/user/basics.b「oadcasting.html 第 1 章 TensorFIow の基礎 21

8. TensorFlowはじめました 3

図 2.11 : 誤差の変化グリッドベースの物体検出 0.0040 0.0035 0.0030 0.0025 0.0020 0.0015 0.0010 0.0005 0 modelO be 凵 055 20000 0.00008 0.00007 0.00006 0.00005 0.00004 0.00003 0.00002 0.00001 0 20000 modelO IOC b55 60000 2 つの誤差は、概ね順調に下がっているように見えます。特に座標の誤差 (model@-toc ユ oss) は、確信度誤差 (modet@_label-loss) と比較して誤差が小さく、ステップ数を重ねるにつれて下がっています。次に、この学習済みモデルを使って、実際にイラスト画像から顔領域を検出します。 2.5 検証リスト 2.14 は、画像を学習済みモデルに入力して結果を出力するプログラムです。リスト 2.14 : eval_ssd. py import json nmport OS import numpy as np import tensorftow as tf from PIL import lmage, ImageDraw import box—util from tfbook_model import model@ as base—model # 使用するモデル from tfbook—model import model@ as model tf. app . ftags . FLAGS FLAGS tf. app. flags . DEF 工 NE—string('image_dir' , None, " 処理対象のディレクトリ " ) tf. app. flags. DEFINE—string('output—dir' , None, " 出力するディレクトリ " ) tf. app . flags . DEFINE—string('train_path' None, " 訓練結果のファイルバス " ) 48 第 2 章