numb3r3推荐系统攻城狮
http://numb3r3.com
Triks for Distributed Tensorflow<p><strong>From</strong>: <a href="https://www.weibo.com/ttarticle/p/show?id=2309404163888068345147">TensorFlow在微博的大规模应用与实践</a></p>
<h1 id="distribute-tensorflow">Distribute Tensorflow</h1>
<p>The distributed tensorflow is very simple, which consists of several <strong>parameter severs</strong> and some <strong>workers</strong>. In each iteration, the updated parameters (i.e., <strong>local parameters</strong>) resulted from each work will be feeded into parameter servers. Then, the local parameters from different workers will be merged by parameter server as <strong>global parameters</strong>, which will be send to workers for future computing in next iteration.</p>
<ul>
<li>The bottleneck of tensorflow is parameter server as it requires large bandwidth to pass all parameters</li>
<li>don’t use synchronous update</li>
<li>use padas to load csv file (as feature sources)</li>
</ul>
Thu, 02 Nov 2017 00:00:00 +0000
http://numb3r3.com/2017/11/02/tf-in-weibo.html
http://numb3r3.com/2017/11/02/tf-in-weibo.htmlThe Magic behind Deep Neural Network - Information Bottleneck<p>For decades, the dream of making a machine that can think and learn like human seems perpetually out of reach, untile the deep learning technology comes. Like a brain, a <strong>deep nerual network</strong> has layers of neurons. When a neuron fires, it sends signals to connected neurons in the layer above. Deep nerual networks learn by adjusting the strenghs of their connections to better convey input signals through multiple layers to neurons associated with the right general concepts. The machines that powered with <strong>deep nerual network</strong> have learned to converse, drive cars beat video games, paint pictures and so on, they have also confounded their human creators, who would never expected so-called “deep-learning” algorithms to work so well. Experts wonder what’s it is about deep learning that it works so well?</p>
<p><img src="http://785j7b.com1.z0.glb.clouddn.com/deep_learning.jpg" alt="deep neural network for classifing dog" /></p>
<p>Recently, <a href="http://www.cs.huji.ac.il/~tishby/">Naftali Tishby</a>, from the Hebrew University of Jerusalem, presented evidence in support of a new theory <strong>explaining how deep learning works</strong>. Tishby argues that deep neural networks learn according to a procedure called the “<a href="https://arxiv.org/pdf/physics/0004057.pdf">information bottleneck</a>”, which he and two collaborators first described in purely theoretical terms in 1999. The idea is that a network rids noisy input data of extraneous details as if by squeezing the information through a bottleneck, retaining only the features most relevant to general concepts. A <a href="https://arxiv.org/abs/1703.00810">new experiment</a> conducted by Tishby’s team reveals how this sequeezing procedure happens during deep learning, at least in the cases they studied. The scientist from google, <a href="https://research.google.com/pubs/104980.html">Alex Alemi</a>, said “the bottleneck could serve not only as a theoretical tool for understanding why neural networks work, but also as a tool for constructing new objectives and architectrures of networks”.</p>
<p><strong>From</strong>: <a href="https://www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921">New Theory Cracks Open the Black Box of Deep Learning</a></p>
Thu, 02 Nov 2017 00:00:00 +0000
http://numb3r3.com/2017/11/02/intro-dnn.html
http://numb3r3.com/2017/11/02/intro-dnn.htmlThe Applications of Deep Learning in Recommender Systems<ul>
<li><strong>AutoRec: Autoencoders Meet Collaborative Filtering (WWW ‘15)</strong></li>
</ul>
<p>Autoencoder(AE)是一个无监督学习模型，它利用反向传播算法，让模型的输出等于输入。文献[2]利用AE来预测用户对物品missing的评分值，该模型的输入为评分矩阵R中的一行(User-based)或者一列(Item-based)，其目标函数通过计算输入与输出的损失来优化模型，而R中missing的评分值通过模型的输出来预测，进而为用户做推荐。</p>
<p>Denoising Autoencoder(DAE)是在AE的基础之上，对输入的训练数据加入噪声。所以DAE必须学习去除这些噪声而获得真正的没有被噪声污染过的输入数据。因此，这就迫使编码器去学习输入数据的更加鲁棒的表达，通常DAE的泛化能力比一般的AE强。Stacked Denoising Autoencoder(SDAE)是一个多层的AE组成的神经网络，其前一层自编码器的输出作为其后一层自编码器的输入。</p>
<ul>
<li><strong>Collaborative Deep Learning for Recommender Systems (KDD ‘15)</strong></li>
</ul>
<p>在SDAE的基础之上，提出了Bayesian SDAE模型，并利用该模型来学习Item的隐向量，其输入为Item的Side information。该模型假设SDAE中的参数满足高斯分布，同时假设User的隐向量也满足高斯分布，进而利用概率矩阵分解来拟合原始评分矩阵。该模型通过最大后验估计(MAP)得到其要优化的目标函数，进而利用梯度下降学习模型参数，从而得到User与Item对应的隐向量矩阵。</p>
<ul>
<li><strong>A Hybrid Collaborative Filtering Model with Deep Structure for Recommender Systems (AAAI ‘17)</strong></li>
</ul>
<p>携程基础BI算法团队通过改进现有的深度模型，提出了一种新的混合协同过滤模型. 利用User和Item的评分矩阵R以及对应的Side information来学习User和Item的隐向量矩阵U与V，进而预测出评分矩阵R中missing的值，并为用户做物品推荐。Additional Stacked Denoising Autoencoder(aSDAE)的深度模型用来学习User和Item的隐向量，该模型的输入为User或者Item的评分值列表，每个隐层都会接受其对应的Side information信息的输入(该模型灵感来自于NLP中的Seq-2-Seq模型，每层都会接受一个输入，我们的模型中每层接受的输入都是一样的，因此最终的输出也尽可能的与输入相等)。</p>
<p>结合aSDAE与矩阵分解模型, 模型通过两个aSDAE学习User与Item的隐向量，通过两个学习到隐向量的内积去拟合原始评分矩阵R中存在的值，其目标函数由矩阵分解以及两个aSDAE的损失函数组成，可通过stochastic gradient descent(SGD)学习出U与V。</p>
<ul>
<li><strong>Deep Neural Networks for YouTube Recommendations (Recsys ‘16)</strong></li>
</ul>
<p>Google利用DNN来做YouTube的视频推荐. 通过对用户观看的视频，搜索的关键字做embedding，然后在串联上用户的side information等信息，作为DNN的输入，利用一个多层的DNN学习出用户的隐向量，然后在其上面加上一层softmax学习出Item的隐向量，进而即可为用户做Top-N的推荐。</p>
<ul>
<li><strong>Convolutional Matrix Factorization for Document Context-Aware Recommendation (Recsys ‘16)</strong></li>
</ul>
<p>通过卷积神经网络(CNN)提出了一种卷积矩阵分解，来做文档的推荐，该模型结合了概率矩阵分解(PMF)与CNN模型。该模型利用CNN来学习Item的隐向量，其对文档的每个词先做embedding，然后拼接所有词组成一个矩阵embedding矩阵，一篇文档即可用一个二维矩阵表示，其中矩阵的行即为文档中词的个数，列即为embedding词向量的长度，然后在该矩阵上做卷积、池化以及映射等，即可得到item的隐向量。User的隐向量和PMF中一样，假设其满足高斯分布，其目标函数由矩阵分解以及CNN的损失函数组成。</p>
Thu, 02 Nov 2017 00:00:00 +0000
http://numb3r3.com/2017/11/02/deep-learning-recsys.html
http://numb3r3.com/2017/11/02/deep-learning-recsys.htmlApp Discovery with Google Play [2/2]<h1 id="part-2-personalized-recommendations-with-related-apps">Part 2: Personalized Recommendations with Related Apps</h1>
<p>In order to create a better overall experience, one must also take into account the tastes of the user and provide personalized recommendations. In this post, a deep learning framework is described which provide personalized recommendations to users based on their previous apps downloads and the context in which they have used.</p>
<p>One particularly strong contextual signal is app relatedness, based on <strong>previous installs</strong> and <strong>search query clicks</strong>. As an example, a user who has search for and plays a lot of graphics-heavy games likely has a preference for apps which are also graphically insense rather than apps with simpler graphics. So, when this user installs a car racign game, the “You might also like” suggestings includes apps which relate to the “seed” app (because they are graphically intense racign games) ranked higher than racing apps with simpler graphics. This allows for a finer level of personalization where the characteristics of the apps are matched with preferences of the user.</p>
<p>To incorporate this app relatedness in recommendation procedure, two approaches are proposed: (a) offline candiate generation i.e., the genration of the potential related apps that other users have downloaded, in addition to the app in question, and (b) online personalized re-ranking, where these candidates are re-ranked using a persinalized ML model.</p>
<p><strong>Offline Candidate Generation</strong></p>
<p>The problem of finding related apps can be formulated as a <strong>nearest neighor search</strong> problem. Given an app X, we want to find the <em>k</em> nearest apps. To approach it, a deep neural network to predict the next app installed by the user given their previous installs is developed. Output embeddings at the final layer of this deep neural network generally represents the types of apps a given user has installed. We then apply the nearest neighbor algorithm to find teh related apps for a given seed app in the trained embeddeding space. Thus, we perform dimensionality reduction by representing apps using embeddings to help prune the space of potential candidates.</p>
<p><strong>Online Persinalized Re-ranking</strong></p>
<p>The objective is to assign scores to the candidates so they can be re-ranked in a personsized way. In this goal, a deep neural network is developed to predict the likelihood of a related app being specifically relevant to the user. The input of this neural network consists of 1) the characteristics of candidate app, 2) user specific context features (e.g., regine, language), 3) user install history.</p>
<p><strong>From</strong>: <a href="https://research.googleblog.com/2016/12/app-discovery-with-google-play-part-2.html">https://research.googleblog.com/2016/12/app-discovery-with-google-play-part-2.html</a></p>
Thu, 02 Nov 2017 00:00:00 +0000
http://numb3r3.com/2017/11/02/app-discover-2.html
http://numb3r3.com/2017/11/02/app-discover-2.htmlApp Discovery with Google Play [1/2]<h1 id="part-1-understanding-of-the-topics-associated-with-an-app">Part 1: Understanding of the Topics associated with an App</h1>
<p>Most of time, people don’t really know what they want specifically, and they only have a broad notion of interest, like “horror games” and “selfie apps”. Such broad searches by topics happens nearlly half of the quries in Play Store.</p>
<p>Searches by topic require more than simply indexing apps by query terms; they require <strong>an understanding of the topics</strong> associated with an app. Some machine learning approaches have been proposed to address this problem, but their success heavilly depends on the number of training examples. While for some popular topics such as “social network”, we had many labeled apps to learn from, the majority of topics had only a handful of examples. The challenge was to learn a very limited number of training examples and scale to millions of apps across thousands of topics.</p>
<p>The inital attempt was to build a deep neural network (DNN) trained to predict topics for an app based on words and phrases from the app title and description. However, given the learning capacity of DNNs, it completely “memorized” the topics for the apps in our small training data, and failed to generailze to new apps that hadn’t been seen before.</p>
<p>In contrast to DNNs, human beings need much less training data. Just by knowing the language describing the apps, people can correctly infer topics from even a few examples. To emulate this, we tried a very rough approximation of this language-centric learning. We tained a neural network to learn how language was used to describe apps. We built a <strong><a href="https://www.tensorflow.org/tutorials/word2vec#the-skip-gram-model">Skip-gram model</a></strong>, where the neural network attempts to predict the words aroud a given word, for example “share” given “photo”. The neural network encodes its knowledges as vectors, referred as <em>embeddings</em>. These embeddings were used to train another model called a <em>classifier</em>, capable of distinguilshing which topics applied to an app. This approach need much less training data to learn about app topics, due to the large amout of learning already done with skip-gram.</p>
<p>While this approach generalized well for popular topics liek “<em>social networking</em>”, we ran into a new problem for more niche topics liek “<em>selfie</em>”. The single classifier built to predict all the topics together focused most of its learning on the popular topics, ignoring the errors it made on the less common ones. To solve this problem, we build a seperate classifier for each topic and tuned them in isolation.</p>
<p><strong>From</strong>: <a href="https://research.googleblog.com/2016/11/app-discovery-with-google-play-part-1.html">https://research.googleblog.com/2016/11/app-discovery-with-google-play-part-1.html</a></p>
Thu, 02 Nov 2017 00:00:00 +0000
http://numb3r3.com/2017/11/02/app-discover-1.html
http://numb3r3.com/2017/11/02/app-discover-1.html大数据平台的技术演进与实践<ul>
<li><strong><a href="https://mp.weixin.qq.com/s?__biz=MzA5NzkxMzg1Nw==&mid=2653162187&idx=1&sn=d10f63b9489e18733838cf6c4489282d&chksm=8b4937a5bc3ebeb3811ec69bd3aad214ef730b08721ff70223e36f1efb53e14f7478a485df04#rd">58大数据平台</a></strong></li>
</ul>
<p><img src="http://785j7b.com1.z0.glb.clouddn.com/58tongcheng.webp" alt="" /></p>
<p>58大数据平台架构大的方面来说分为三层：数据基础平台层、数据应用平台层、数据应用层，还有两列监控与报警和平台管理。</p>
<p>数据基础平台层又分为四个子层：</p>
<ul>
<li>接入层，包括了Canal/Sqoop（主要解决数据库数据接入问题）、还有大量的数据采用Flume解决方案；</li>
<li>存储层，典型的系统HDFS（文件存储）、HBase（KV存储）、Kafka（消息缓存）；</li>
<li>再往上就是调度层，这个层次上我们采用了Yarn的统一调度以及Kubernetes的基于容器的管理和调度的技术；</li>
<li>再往上是计算层，包含了典型的所有计算模型的计算引擎，包含了MR、HIVE、Storm、Spark、Kylin以及深度学习平台比如Caffe、Tensorflow等等。</li>
</ul>
<p>数据应用平台主要包括以下功能：</p>
<ul>
<li>元信息管理，还有针对所有计算引擎、计算引擎job的作业管理，之后就是交互分析、多维分析以及数据可视化的功能。</li>
<li>再往上是支撑58集团的数据业务，比如说流量统计、用户行为分析、用户画像、搜索、广告等等。</li>
<li>针对业务、数据、服务、硬件要有完备的检测与报警体系。</li>
<li>平台管理方面，需要对流程、权限、配额、升级、版本、机器要有很全面的管理平台。</li>
</ul>
<p><img src="http://785j7b.com1.z0.glb.clouddn.com/58tongcheng-1.webp" /></p>
<p>这个图展示的是架构图中所包含的系统数据流动的情况。分为两个部分：</p>
<p>首先是实时流，就是黄色箭头标识的这个路径。数据实时采集过来之后首先会进入到Kafka平台，先做缓存。实时计算引擎比如Spark streaming或storm会实时的从Kafka中取出它们想要计算的数据。经过实时的处理之后结果可能会写回到Kafka或者是形成最终的数据存到MySQL或者HBase，提供给业务系统，这是一个实时路径。</p>
<p>对于离线路径，通过接入层的采集和收集，数据最后会落到HDFS上，然后经过Spark、MR批量计算引擎处理甚至是机器学习引擎的处理。其中大部分的数据要进去数据仓库中，在数据仓库这部分是要经过数据抽取、清洗、过滤、映射、合并汇总，最后聚合建模等等几部分的处理，形成数据仓库的数据。然后通过HIVE、Kylin、SparkSQL这种接口将数据提供给各个业务系统或者我们内部的数据产品，有一部分还会流向MySQL。以上是数据在大数据平台上的流动情况。</p>
Fri, 13 Oct 2017 00:00:00 +0000
http://numb3r3.com/2017/10/13/industrial-bigdata.html
http://numb3r3.com/2017/10/13/industrial-bigdata.htmlThe tensorflow tutorials - Neural Network [2/4]<h1 id="2-neural-networks-in-tensorflow">2. Neural Networks in Tensorflow</h1>
<h2 id="21-introduction">2.1 Introduction</h2>
<p><img src="http://785j7b.com1.z0.glb.clouddn.com/tensorflow_model-1.png" alt="neural network in tensorflow" /></p>
<p>The above image demonstrate how the neural network models work in tensorflow by the following steps:</p>
<ol>
<li>
<p>The <strong>input data</strong>: the training, validation, and test dataset. The test and validation datasets can be placed in a <code class="highlighter-rouge">tf.constant()</code>. And the training dataset is placed in <code class="highlighter-rouge">tf.placeholder()</code>, so that it can be feeded in batches during the process of optimizing the parameters contains in the model.</p>
</li>
<li>
<p>The <strong>Nerual Network Model</strong> with all of its layers. This can be a simple fully connected neural network with only single layer, or a more complicated neural network with 5, 9, 16 etc layers.</p>
</li>
<li>
<p>The <strong>weights</strong> matrices and <strong>biases</strong> vectors defined in the proper shape and initialized to their initial values (<em>One weight matrix and bias vector per layer</em>).</p>
</li>
<li>
<p>The <strong>loss</strong> value: the model has output and by comparing the output with the groud truth, we can calculate teh loss value (e.g., with the softmax with cross-entropy in classification case). The loss value is an indication of how close the estimated trainging labels are to the actual training labels and will be used to update the weight values.</p>
</li>
<li>
<p>An <strong>optimizer</strong>, which will use the calculated loss value to update the weights and biases with backpropagation.</p>
</li>
</ol>
<h2 id="22-loading-the-data">2.2 Loading the data</h2>
<p>First, we will load the dataset which we are going to use to train and test neural networks. Here, we will download the <a href="http://yann.lecun.com/exdb/mnist/">MNIST</a> (classification of handwritten digits) and <a href="https://www.cs.toronto.edu/~kriz/cifar.html">CIFAR-10</a> (classification of small image across 10 distinct classes). The MNIST dataset contains 60.000 images where each image size is 28x28x1 (grayscale). The CIFAR-10 dataset contains 60.000 colour images (3 channels) - size 32x32x3.</p>
<table style="max-width: 600px;">
<thead>
<tr class="header">
<th align="center" width="50%">MNIST</th>
<th align="center" width="50%">CIFAR-10</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="center" width="50%"><img src="http://785j7b.com1.z0.glb.clouddn.com/mnist.png" /></td>
<td align="center" width="50%"><img src="http://785j7b.com1.z0.glb.clouddn.com/cifar_10.png" /></td>
</tr>
</tbody>
</table>
<p>First, lets define some methods to load and reshape the downloaded data into the necessary format.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def randomize(dataset, labels):
permutation = np.random.permutation(labels.shape[0])
shuffled_datadet = dataset[permutation, :, :]
shuffled_labels = labels[permutation]
return (shuffled_dataset, shuffled_labels)
def one_hot_encode(np_array):
return (np.arange(10) == np_array[:, None]).astype(np.float32)
def reformat_data(dataset, labels, image_width, image_height, image_depth):
_dataset = np.array([np.array(image_date).reshape(image_width, image_height, image_depth) for image_data in dataset])
_labels = one_hot_encode(np.array(labels, dtype=np.float32))
return randomize(_dataset, _labels)
def flatten_tf_array(array):
# flatting an array (a fully connected network needs an flat array as its input)
shape = array.get_shape().as_list()
return tf.reshape(array, [shape[0], shape[1] * shape[2] * shape[3]])
def accuracy(preds, labels):
return (100.0 * np.sum(np.argmax(preds, 1) == np.argmax(labels, 1)) / preds.shape[0])
</code></pre></div></div>
<p>Now, we can load the MNIST and CIFAR-10 datasets with:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
# load MNIST dataset
mnist_folder = './data/minist'
mnist_image_width = 28
mnist_image_height = 28
mnist_image_depth = 1
mnist_num_labels = 10
mndata = MNIST(mnist_folder)
_mnist_train_dataset, _mnist_train_labels = mndata.load_training()
_mnist_test_dataset, _mnist_test_labels = mndata.load_testing()
mnist_train_dataset, mnist_train_labels = reformat_data(_mnist_train_dataset, _mnist_train_labels, mnist_image_width, mnist_image_height, mnist_image_height)
mnist_test_dataset, mnist_test_labels = reformat_data(_mnist_test_dataset, _mnist_test_labels, mnist_image_width, mnist_image_height, mnist_image_height)
# load CIFAR-10 dataset
c10_folder = './data/cifar10/'
train_datasets = ['data_batch_%d' % n for n in range(1, 6)]
test_dataset = ['test_batch']
c10_image_width = 32
c10_image_height = 32
c10_image_depth = 3
c10_num_labels = 10
with open(c10_folder + test_dataset[0]) as f:
c10_test_dict = pickle.load(f, encoding='bytes')
_c10_test_dataset, _c10_test_labels = c10_test_dict[b'data'], c10_test_dict[b'labels']
c10_test_dataset, c10_test_labels = reformat_data(_c10_test_dataset, _c10_test_labels, c10_image_width, c10_image_height, c10_image_depth)
_c10_train_dataset, _c10_train_labels = [], []
for train_dataset in train_datasets:
with open(c10_folder + train_dataset) as f:
c10_train_dict = pickle.load(f, encoding='bytes')
_c10_train_dataset_, _c10_train_labels_ = c10_train_dict[b'data'], c10_train_dict[b'labels']
_c10_train_dataset.append(_c10_train_dataset_)
_c10_train_labels += _c10_train_labels_
_c10_train_dataset = np.concatenate(_c10_train_dataset, axis=0)
c10_train_dataset, c10_train_labels = reformat_data(_c10_train_dataset, _c10_train_labels, c10_image_width, c10_image_height, c10_image_depth)
del _c10_train_dataset
del _c10_train_labels
</code></pre></div></div>
<h2 id="23-creating-a-1-layer-neural-network">2.3 Creating a 1-layer Neural Network</h2>
Sun, 24 Sep 2017 00:00:00 +0000
http://numb3r3.com/2017/09/24/tf-basic-tutorial-2.html
http://numb3r3.com/2017/09/24/tf-basic-tutorial-2.htmlThe tensorflow tutorials - Background [1/4]<p><strong>From</strong>: <a href="http://ataspinar.com/2017/08/15/building-convolutional-neural-networks-with-tensorflow/">Building Convolutional Neural Network with Tensorflow</a></p>
<h1 id="1-tensorflow-basics">1. Tensorflow basics</h1>
<p>Here, I will give a basic introduction to tensorflow for newbies (you can go ahead and skip to this section if you are already familiar with Tensorflow). This series are designed to get you quickly up to speed up with <em>deep learning</em>.</p>
<h2 id="11-constants-and-variables">1.1 Constants and Variables</h2>
<p>The basic units in tensorflow are <strong>contants</strong>, <strong>variables</strong> and <strong>placeholders</strong>. The <code class="highlighter-rouge">tf.constant</code> has a contant value which can not be changed, and <code class="highlighter-rouge">tf.Variable</code> can be changed after it has been set.</p>
<p>For example, we can create weight matrices and biases vectors which can be used in a neural network.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>weights = tf.Variable(tf.truncated_normal([256 * 256, 10]))
biases = tf.Variable(tf.zeros([10]))
print(weights.get_shape().as_list())
print(biases.get_shape().as_list())
</code></pre></div></div>
<h2 id="12-tensorflow-graphs-and-sessions">1.2 Tensorflow Graphs and Sessions</h2>
<p>In Tensorflow, all of the different <strong>variables</strong> and the <strong>operations</strong> done on these variables are saved in a graph. After you have build a graph which contains all of the computationsl steps for your model, you can run this graph within a <strong>session</strong>. This session then distributes all of the computations across the available CPU an GPU resources.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>graph = tf.Graph()
with graph.as_default():
a = tf.Variable(8, tf.float32)
b = tf.Variable(tf.zeros([2,2], tf.float32))
with tf.Session(graph=graph) as sess:
tf.global_variables_initialier().run()
</code></pre></div></div>
<h2 id="13-placeholders-and-feed_dict">1.3 Placeholders and feed_dict</h2>
<p>The <strong>placeholders</strong> in Tensorflow do not require an initial value and only serve to allocate the necessary amount of memory. During a session, the placeholder can be filled in with (external) data with a <strong>feed_dict</strong>.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>list_of_points1 = [[1,2], [3,4], [5, 6], [7, 8]]
list_of_points2 = [[15, 16], [13, 14], [11, 12], [9, 10]]
list_of_points1 = np.array([np.array(item).reshape(1, 2) for item in list_of_points1])
list_of_points2 = np.array([np.array(item).reshape(1, 2) for item in list_of_points1])
graph = tf.Graph()
with graph.as_default():
point1 = tf.placeholder(tf.float32, shape=(1, 2))
point2 = tf.placeholder(tf.float32, shape=(1, 2))
def cal_euclidian_dist(p, q):
diff = tf.subtract(p, q)
power2 = tf.pow(diff, tf.constant(2.0, shape=(1, 2)))
add = tf.reduce_sum(power2)
eclidian_dist = tf.sqrt(add)
return eclidian_dist
dist = cal_euclidian_dist(point1, point2)
with tf.Session(graph=graph) as sess:
tf.global_variables_initializer().run()
for i in range(len(list_of_points1)):
p = list_of_points1[i]
q = list_of_points2[i]
feed_dict = {point1: p, point2: q}
distance = sess.run([dist], feed_dict=feed_dict)
print("the distance between {} and {} -> {}").format(p, q, distance)
</code></pre></div></div>
Sun, 24 Sep 2017 00:00:00 +0000
http://numb3r3.com/2017/09/24/tf-basic-tutorial-1.html
http://numb3r3.com/2017/09/24/tf-basic-tutorial-1.htmlThe review of Recurrant Neural Network - NLP application [3/3]<h2 id="nlp-application">NLP application</h2>
<p><strong>From</strong>: <a href="https://arxiv.org/pdf/1506.00019.pdf">A Critical Review of Recurrent Neural Networks for Sequence Learning</a></p>
<p><strong>Representations of natural language inputs and output</strong></p>
<p>When words are output at each time step, generally the output consists of a softmax vector <script type="math/tex">\mathbf{y}^{(t)} \in \mathbb{R}^K</script> where <script type="math/tex">K</script> is the size of the vocabulary. A softmax layer is an element-wise logistic function that is normalized so that all of its components sum to one. Intuitively, these outputs correspond to the probabilities that each word is the correct output at that time step.</p>
<p>For application where an input consists of a sequence of words, typically the words are fed to the network one at a time in consecutive time steps. In these cases, the simplest way to represent words is a <em>one-hot</em> encoding, using binary vectors with a length equal to the size of the vocabulary, so “1000” and “0100” would represent the first and the second words in the vocabulary respectively. However, this encoding is inefficient, requiring as many bits as the vocabulary is large. Further it offers no direct way to capture different aspects of similarity between words in the encoding itself. Thus it is common now to model words with a distributed representation using a <em>meaning vector</em>. In some cases, these meanings for words are learned given a large corpus of supervised data, but it is more usual to initialize the <em>meaning vectors</em> using an embedding based on word co-occurrence statistics. Freely available code to produce word vectors from these satistics include <em>GloVe</em> and <em>word2vec</em>.</p>
Wed, 20 Sep 2017 00:00:00 +0000
http://numb3r3.com/2017/09/20/rnn-review-3.html
http://numb3r3.com/2017/09/20/rnn-review-3.htmlThe review of Recurrant Neural Network - LSTM & BRNN [2/3]<h2 id="modern-rnn-architectures-lstm--brnns">Modern RNN architectures (LSTM & BRNNs)</h2>
<p>The most successful RNN architecture for sequence learning stem from two work published in 1997. The first, <em>Long Short-Term Memory</em>, introduces the memory cell, a unit of computation that replaces traditional nodes in the hidden layer of a network. With these memory cells, networks are able to overcome difficulties with training encounterred by earlier recurrent network. The second network, <em>Bidirectional Recurrent Neural Network</em>, introduces an architecture in which information from both the future and the past are used to determine the output at any point in the sequence.</p>
<p><strong>Long short-term memory (LSTM)</strong></p>
<p>The LSTM model is introduced primarily in order to overcome the problem of vanishing gradients. This model resembles a standard recurrent neural network with a hidden layer, but each oridinary node in the hidden layer is replaced by a <em>memory cell</em>. Each memory cell contains a node with a self-connected recurrent edge of fixed weight one, ensuring that the gradient can pass across many time steps without vanishing or exploding. To distinguish references to a memory cell and not an ordinary node, we use the subscript <script type="math/tex">c</script>.</p>
<p>The term “long short-term memory” comes from the following intuition. Simple recurrent neural network have <em>long-term memory</em> in the form of weights. The weights change slowly during training, encoding general knowledge about the data. They also have <em>short-term memory</em> in the form of ephemeral activations, which pass from each node to successive nodes. The LSTM mdoel introduces an intermediate type of storge via the memory cell. A memory cell is a composite unit, built from simpler nodes in a specific connectivity pattern, with the novel inclusion of multiplicative nodes. All elements of the LSTM cell are enumerated and described below.</p>
<ul>
<li>
<p><em>Input node</em>: This unit, labeled <script type="math/tex">g_c</script>, is a node that takes activation in standard way from the input layer <script type="math/tex">\mathbf{x}^{(t)}</script> at the current time step and (along recurrent edge) from the hidden layer at the previous time step <script type="math/tex">\mathbf{h}^{(t-1)}</script>. Typically, the summed weighted input is run through a <em>tanh</em> activation function, although in the original LSTM paper, the activation function is <em>sigmoid</em>.</p>
</li>
<li>
<p><em>Input gate</em>: Gates are a distinctive feature of the LSTM approach. A gate is a sigmoidal unit that, like the input node, takes activation from the current data point <script type="math/tex">\mathbf{x}^{(t)}</script> as well as from the hidden layer at the previous time step. A gate is so-called because its value is used to multiply the value of another node. It is a <em>gate</em> in the sense that if its value is zero, then flow from the other node is cut off. If the value of the gate is one, all flow is passed through. The value of the <em>input gate</em> <script type="math/tex">i_c</script> multiplies the value of the <em>input node</em>.</p>
</li>
<li>
<p><em>Internal state</em>: At the heart of each memory cell is a node <script type="math/tex">s_c</script> with linear activation, which is referred to in the original paper as the “internal state” of the cell. The internal state <script type="math/tex">s_c</script> has a self-connected recurrent edge with fixed unit weight. Because this edge spans adjacent time steps with constant weight, error can flow across time steps without vanishing or exploding. This edge is often called the <em>constrant error carousel</em>. In vector notation, the update for the internal state is <script type="math/tex">\mathbf{s}^{(t)} = \mathbf{g}^{(t)} \odot \mathbf{i}^{(t)} + \mathbf{s}^{(t-1)}</script> where <script type="math/tex">\odot</script> is pointwise multiplication.</p>
</li>
<li>
<p><em>Forget gate</em>: These gates <script type="math/tex">f_c</script> were introduced by Gers et al.. They provide a method by which the network can learn to flush the contents of the internal state. This is especially useful in continuously running networks. With forget gates, the equation to calculate the internal state on the forward pass is</p>
</li>
</ul>
<p>\begin{equation}
\mathbf{s}^{(t)} = \mathbf{g}^{(t)} \odot \mathbf{i}^{(t)} + \mathbf{f}^{(t)} \odot \mathbf{s}^{(t-1)}
\end{equation}</p>
<ul>
<li><em>Output gate</em>: the value <script type="math/tex">v_c</script> ultimately produced by a memory cell is the value of the internal state <script type="math/tex">s_c</script> multiplied by the value of the <em>output gate</em> <script type="math/tex">o_c</script>. It is customary that the internal state first be run through a <em>tanh</em> activation function, as this gives the output of each cell the same dynamic range as an ordinary <em>tanh</em> hidden unit. However, in other neural network research, rectified linear units, which have a greater dynamic range, are easier to train. Thus it seems plausible that the nolinear function on the internal state might be ommited.</li>
</ul>
<p><strong>Bidirectional recurrent neural networks (BRNNs)</strong></p>
<p>Along with the LSTM, one of the most used RNN architecture is the bidirectinal recurrent neural network (BRNN). In this architecture, there are two layers of hidden nodes. Both hidden layers are connected to input and output. The two hidden layers are differentiated in that the first has reccurent connections from the past time steps while in the second the direction of recurrent of connections is flipped, passing activation backwards along the sequence. Given an input sequence and target sequence, the BRNN can be trained by ordinary backpropagation after unfolding across time. The following three equations describes a BRNN:</p>
<p>\begin{equation}
\mathbf{h}^{(t)} = \sigma(W^{hx} \mathbf{x}^{(t)} + W^{hh} \mathbf{h}^{(t-1)} + \mathbf{b}_h)
\end{equation}</p>
<p>\begin{equation}
\mathbf{z}^{(t)} = \sigma(W^{(zx)} \mathbf{x}^{(t)} + W^{zz} \mathbf{z}^{(t+1)} + \mathbf{b}_z)
\end{equation}</p>
<p>\begin{equation}
\hat{\mathbf{y}}^{(t)} = softmax(W^{yh} \mathbf{h}^{(t)} + W^{yt} \mathbf{z}^{(t)} + \mathbf{b}_y)
\end{equation}</p>
<p>where <script type="math/tex">\mathbf{h}^{(t)}</script> and <script type="math/tex">\mathbf{z}^{(t)}</script> are the values of the hidden layers in the forwards and backwards directions respectively.</p>
<p>One limitation of the BRTT is that cannot run continuously, as it requires a fixed endpoint in both the future and in the past. Further, it is not an appropriate machine learning algorithm for the online setting, as it is implausible to receive information from the future, i.e., to know sequence elements that have not beeen observed. But for prediction over a sequence of fixed length, it is often sensible to take into account both past and future seuqnce elements. Consider the natural laguange task of part-of-speech tagging. Given any word in a sequence, information about both the words which precede and those which follow it is useful for predicting that word’s part-of-speech.</p>
Wed, 20 Sep 2017 00:00:00 +0000
http://numb3r3.com/2017/09/20/rnn-review-2.html
http://numb3r3.com/2017/09/20/rnn-review-2.html