Search
๐Ÿคจ

learning rate decay: staircase vs linear

์ž‘์„ฑ์ผ
2021/02/09
ํƒœ๊ทธ
์—ฐ๊ตฌ
๋จธ์‹ ๋Ÿฌ๋‹
Statistics
Deep Learning
์ž‘์„ฑ์ž
Empty
1 more property

Learning rate decay

โ€ข
ํ•™์Šต์˜ phase (epoch, training steps)๊ฐ€ ์ฆ๊ฐ€ํ•  ๋•Œ optimization์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์ธ learning rate๋ฅผ ์ ์ฐจ ๊ฐ์†Œ์‹œ์ผœ ๋†’์€ learning rate์—์„œ๋Š” ์ฐพ์„ ์ˆ˜ ์—†๋Š” global optima๋ฅผ ์ฐพ๊ณ ์ž ํ•˜๋Š” ์ „๋žต

staircasing

โ€ข
learning rate๋ฅผ ํŠน์ • epoch ํ˜น์€ step์— ๊ณ„๋‹จ์ฒ˜๋Ÿผ ๋š๋š ๋Š์–ด ๋–จ์–ด๋œจ๋ฆฌ๋Š” ๊ธฐ๋ฒ•
โ€ข
staircase๊ฐ€ ์ ์šฉ๋˜์ง€ ์•Š๋Š”๋‹ค๋ฉด, learning rate์€ ๋งค step๋งˆ๋‹ค ์ผ์ •๋Ÿ‰ decay๋˜๋ฉด์„œ ์™„๋งŒํ•œ ๊ณก์„ ์˜ ํ˜•ํƒœ๋กœ ๋‚ด๋ ค๊ฐ€๊ฒŒ ๋จ
โ€ข
staircase=True์ธ ๊ฒฝ์šฐ, ์™„๋งŒํ•˜๊ฒŒ ํ•˜๊ฐ•ํ•˜๋Š” learning rate๋ณด๋‹ค๋„ ์ด๋ก ์ ์œผ๋กœ ๋” ๋งŽ์€ gradient๋ฅผ ์ด๋™ํ•˜๊ฒŒ ๋จ
โ—ฆ
decay ์‹œ์  (์˜ˆ: ๋งค 1 epoch)์— ๋„๋‹ฌํ–ˆ์„ ๋•Œ์˜ learning rate๋Š” ๋™์ผํ•˜์ง€๋งŒ, decay ์‹œ์ ์„ ๋™์ผํ•˜๊ธฐ ์ „๊นŒ์ง€์˜ learning rate๊ฐ€ staircase์ธ ๊ฒฝ์šฐ์—๋Š” ์œ ์ง€๋˜๊ณ  ์•„๋‹Œ ๊ฒฝ์šฐ ๊ณ„์†ํ•ด์„œ ํ•˜๊ฐ•ํ•˜๊ธฐ ๋•Œ๋ฌธ
โ€ข
staircase=True์ธ ๊ฒฝ์šฐ, objective function์ด ์œ„์•„๋ž˜๋กœ ์š”๋™์น˜๋Š” ํ˜„์ƒ(oscillation)์ด ๋” ๋ˆˆ์— ๋„๊ฒŒ ๋ฐœ์ƒํ•จ
โ—ฆ
oscillation์ด local optima์—์„œ ํƒˆ์ถœํ•˜๊ฒŒ ํ•˜๊ณ  ํ•™์Šต์„ ๋•๋Š”๋‹ค๋Š” ๊ฒŒ ์ผ๋ฐ˜์ ์ธ ํ•ด์„. ๊ทธ๋Ÿฌ๋‚˜ ๋„ˆ๋ฌด ํญ์ด ํฐ ๊ฒฝ์šฐ convergence์— ์ง€์žฅ์„ ์ค„ ์ˆ˜๋„ ์žˆ๋‹ค.
โ€ข
โ—ฆ
'๋งŒ์•ฝ ์˜ํ–ฅ์„ ์ฃผ๋Š” ๊ฒฝ์šฐ, learning rate๋“ฑ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•œ fine-tuning์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ๋‹ค' ๋ผ๊ณ  ์ฃผ์žฅ

Pose Estimation, staircase decay vs linear decay

โ€ข
staircase decay์™€ linear decay๊ฐ„์— ํฐ loss / accuracy ์ฐจ์ด๋Š” ์•Œ์•„๋ณด๊ธฐ ํž˜๋“ฆ,
โ€ข
์˜ˆ์ƒ๊ณผ๋Š” ๋‹ฌ๋ฆฌ staircase learning rate decay ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ linear decay์—๋„ validation ์‹œ fluctuation์ด ์žˆ์Œ