{"id":315,"date":"2024-04-18T08:35:02","date_gmt":"2024-04-18T08:35:02","guid":{"rendered":"https:\/\/rajarshi-ray.com\/?p=315"},"modified":"2024-04-18T13:40:31","modified_gmt":"2024-04-18T13:40:31","slug":"315","status":"publish","type":"post","link":"https:\/\/rajarshi-ray.com\/index.php\/2024\/04\/18\/315\/","title":{"rendered":"A Few Words on Transfer Learning"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Transfer learning in deep learning involves using a pre-trained model on a source task and adapting it to a related target task. A common approach to transfer learning is fine-tuning, where the pre-trained model is further trained on the target task with a smaller learning rate to avoid catastrophic forgetting of the source task.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s denote:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>Ds<\/em>\u200b: Source dataset<\/li>\n\n\n\n<li><em>Dt<\/em>\u200b: Target dataset<\/li>\n\n\n\n<li><em>Mt<\/em>\u200b: Target model (initialized from <em>Ms<\/em>\u200b and fine-tuned on <em>Dt<\/em>\u200b)<\/li>\n\n\n\n<li><em>\u03b8<\/em><em>s<\/em>\u200b: Parameters of <em>M<\/em><em>s<\/em><\/li>\n\n\n\n<li><em>\u03b8<\/em><em>t<\/em>\u200b: Parameters of <em>M<\/em><em>t<\/em>\u200b<\/li>\n\n\n\n<li><em>L<\/em><em>s<\/em>\u200b: Loss function on the source task<\/li>\n\n\n\n<li><em>L<\/em><em>t<\/em>\u200b: Loss function on the target task<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The transfer learning process typically involves the following steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Initialization<\/strong>: Initialize <em>M<\/em><em>s<\/em>\u200b with weights <em>\u03b8<\/em><em>s<\/em>\u200b pre-trained on <em>D<\/em><em>s<\/em>\u200b.<\/li>\n\n\n\n<li><strong>Fine-tuning<\/strong>: Train <em>M<\/em><em>t<\/em>\u200b on <em>D<\/em><em>t<\/em>\u200b by updating its parameters <em>\u03b8<\/em><em>t<\/em>\u200b to minimize the loss <em>L<\/em><em>t<\/em>\u200b.<\/li>\n\n\n\n<li><strong>Evaluation<\/strong>: Evaluate the performance of <em>M<\/em><em>t<\/em>\u200b on the target task using a separate validation or test set from <em>D<\/em><em>t<\/em>\u200b.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">The mathematical model for fine-tuning can be represented as an optimization problem:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>\u03b8t<\/em>\u200bmin\u200b<em>Lt<\/em>\u200b(<em>\u03b8t<\/em>\u200b)=<em>\u03b8t<\/em>\u200bmin\u200bL(<em>Dt<\/em>\u200b,<em>Mt<\/em>\u200b(<em>\u03b8t<\/em>\u200b))<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">where \ufffdL is the loss function, typically a cross-entropy loss for classification tasks or a mean squared error for regression tasks. The optimization can be performed using stochastic gradient descent (SGD) or its variants, with a smaller learning rate compared to training from scratch.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The learning rate during fine-tuning is often chosen to be smaller because we want to preserve the knowledge gained from the source task while allowing the model to adapt to the nuances of the target task without overfitting.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Additionally, one might also introduce regularization terms to prevent overfitting during fine-tuning. Regularization terms such as L1 or L2 regularization can be added to the loss function:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">min\u2061\ufffd\ufffd\ufffd\ufffd(\ufffd\ufffd)+\ufffd\ufffd(\ufffd\ufffd)<em>\u03b8<\/em><em>t<\/em>\u200bmin\u200b<em>L<\/em><em>t<\/em>\u200b(<em>\u03b8<\/em><em>t<\/em>\u200b)+<em>\u03bb<\/em><em>R<\/em>(<em>\u03b8<\/em><em>t<\/em>\u200b)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">where \ufffd(\ufffd\ufffd)<em>R<\/em>(<em>\u03b8<\/em><em>t<\/em>\u200b) is the regularization term penalizing large parameter values and \ufffd<em>\u03bb<\/em> is the regularization strength.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This mathematical model captures the essence of transfer learning in deep learning, where knowledge from a source task is utilized to improve learning on a related target task.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Transfer learning in deep learning involves using a pre-trained model on a source task and adapting it to a related target task. A common approach to transfer learning is fine-tuning,&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-315","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/rajarshi-ray.com\/index.php\/wp-json\/wp\/v2\/posts\/315","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rajarshi-ray.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rajarshi-ray.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rajarshi-ray.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rajarshi-ray.com\/index.php\/wp-json\/wp\/v2\/comments?post=315"}],"version-history":[{"count":8,"href":"https:\/\/rajarshi-ray.com\/index.php\/wp-json\/wp\/v2\/posts\/315\/revisions"}],"predecessor-version":[{"id":327,"href":"https:\/\/rajarshi-ray.com\/index.php\/wp-json\/wp\/v2\/posts\/315\/revisions\/327"}],"wp:attachment":[{"href":"https:\/\/rajarshi-ray.com\/index.php\/wp-json\/wp\/v2\/media?parent=315"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rajarshi-ray.com\/index.php\/wp-json\/wp\/v2\/categories?post=315"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rajarshi-ray.com\/index.php\/wp-json\/wp\/v2\/tags?post=315"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}