Want to dive even deeper?

Take the course JBoss EAP Configuration, Deployment, and Administration by Jason Shepherd and become an expert!
JBoss EAP Configuration, Deployment, and Administration
by Jason Shepherd

Check it out!
You're watching a preview of this video, click the button on the left to puchase the full version from Atomikos ExtremeTransactions Certification.

Simple Deployment of Recommendation Engines

Recommendation engines are generally built such that a single kind of user interaction with a single kind of item is used to suggest the same kind of interaction with the same kind of item. In practice however, this approach is flawed for several reasons. First, multiple kinds of interactions with multiple kinds of items are typically available and should be used. Second, recommendation is better viewed as a ranking problem rather than a regression problem. Finally, practical recommendation systems should be constantly self-training as today’s recommendations and selections can be used to train tomorrow’s recommender.

This session will describe a practical recommendation architecture and implementation style that addresses all of the above issues and which is considerably easier to implement and deploy than conventional approaches. Several of the techniques that I will describe have never (to my knowledge) appeared in the research literature.


Published on
  • 103
  • 0
  • 0
  • 4
  • 0
  • What  Really  Matters  in   Recommenders ©MapR  Technologies  2013-­‐  Confidential 1
  • Topic  For  Today ▪ What  is  recommendation?   ▪ What  makes  it  different?   ▪ What  is  multi-­‐model  recommendation?   ▪ How  can  I  build  it  using  common  household  items? ©MapR  Technologies  2013-­‐  Confidential 2
  • Oh  …  Also  This ▪ Detailed  break-­‐down  of  a  recommendation  system  running  with   Mahout  on  MapR   ▪ With  code  examples ©MapR  Technologies  2013-­‐  Confidential 3
  • I  may  have  to   summarize   ©MapR  Technologies  2013-­‐  Confidential 4
  • I  may  have  to   summarize   ©MapR  Technologies  2013-­‐  Confidential 4
  • I  may  have  to   summarize     just  a  bit ©MapR  Technologies  2013-­‐  Confidential 5
  • Part  1:   5  minutes  of  background ©MapR  Technologies  2013-­‐  Confidential 6
  • Part  2:   5  minutes:  I  want  a  pony ©MapR  Technologies  2013-­‐  Confidential 7
  • Part  1:   5  minutes  of  background ©MapR  Technologies  2013-­‐  Confidential 9
  • What  Does  Machine  Learning  Look  Like? ©MapR  Technologies  2013-­‐  Confidential 10
  • What  Does  Machine  Learning  Look  Like? ! T # T ! A A # ! A A # = % A1 &! 2 $ " 1 2 $ " 1 % AT &" " 2 $ ! T A1 A1 =% T % A 2 A1 " ! r # ! AT A % 1 &=% 1 1 % r2 & % ATA1 " $ " 2 O(κ  k  d  +  k3  d)  =  O(k2  d  log  n  +  k3  d)  for  small  k,   r1 = ! A1T A1 % high  quality   O(κ  d  log  k)  or  O(d  log  κ  log  k)  for  larger  k,   looser  quality ©MapR  Technologies  2013-­‐  Confidential 11 " A1 A2 # $ # T A1 A 2 & A TA 2 & 2 $ # T A1 A 2 &! h1 # % & T &% h2 & A 2 A 2 $" $ ! # #% h1 & A A2 & $% h2 & " $ T 1
  • What  Does  Machine  Learning  Look  Like? ! T # T ! A A # ! A A # = % A1 &! 2 $ " 1 2 $ " 1 % AT &" " 2 $ ! T A1 A1 =% T % A 2 A1 " ! r # ! AT A % 1 &=% 1 1 % r2 & % ATA1 " $ " 2 O(κ  k  d  +  k3  d)  =  O(k2  d  log  n  +  k3  d)  for  small  k,   r1 = ! A1T A1 % high  quality   O(κ  d  log  k)  or  O(d  log  κ  log  k)  for  larger  k,   looser  quality " A1 A2 # $ # T A1 A 2 & A TA 2 & 2 $ # T A1 A 2 &! h1 # % & T &% h2 & A 2 A 2 $" $ ! # #% h1 & A A2 & $% h2 & " $ T 1 But  tonight  we’re  going  to  show  you  how  to  keep  it  simple  yet  powerful… ©MapR  Technologies  2013-­‐  Confidential 11
  • Recommendations  as  Machine  Learning ▪ Recommendation:     – – – Involves  observation  of  interactions  between  people  taking  action  (users)   and  items  for  input  data  to  the  recommender  model   Goal  is  to  suggest  additional  appropriate  or  desirable  interactions   Applications  include:  movie,  music  or  map-­‐based  restaurant  choices;   suggesting  sale  items  for  e-­‐stores  or  via  cash-­‐register  receipts ©MapR  Technologies  2013-­‐  Confidential 12
  • ©MapR  Technologies  2013-­‐  Confidential 13
  • ©MapR  Technologies  2013-­‐  Confidential 14
  • Part  2:   How  recommenders  work     (I  still  want  a  pony) ©MapR  Technologies  2013-­‐  Confidential 15
  • Recommendations Recap:   Behavior  of  a  crowd  helps  us   understand  what  individuals  will  do ©MapR  Technologies  2013-­‐  Confidential 16
  • Recommendations Alice Charles ©MapR  Technologies  2013-­‐  Confidential Alice  got  an  apple  and  a   puppy Charles  got  a  bicycle 17
  • Recommendations Alice Bob Charles ©MapR  Technologies  2013-­‐  Confidential Alice  got  an  apple  and  a   puppy Bob  got  an  apple Charles  got  a  bicycle 18
  • Recommendations Alice Bob ? What  else  would  Bob  like? Charles ©MapR  Technologies  2013-­‐  Confidential 19
  • Recommendations Alice Bob A  puppy,  of  course! Charles ©MapR  Technologies  2013-­‐  Confidential 20
  • You  get  the  idea  of  how   recommenders  work…     (By  the  way,  like  me,  Bob   also  wants  a  pony)     ©MapR  Technologies  2013-­‐  Confidential 21
  • Recommendations Alice What  if  everybody  gets  a   pony?   Bob Amelia Charles ©MapR  Technologies  2013-­‐  Confidential 22
  • Recommendations Alice What  if  everybody  gets  a   pony?   Bob Amelia ? ! What  else  would  you   recommend  for  Amelia? Charles ©MapR  Technologies  2013-­‐  Confidential 22
  • Recommendations Alice Bob Amelia ? Charles ©MapR  Technologies  2013-­‐  Confidential 23
  • Recommendations Alice Bob Amelia ? If  everybody  gets  a  pony,  it’s   not  a  very  good  indicator  of   what  to  else  predict... Charles ©MapR  Technologies  2013-­‐  Confidential 23
  • Problems  with  Raw  Co-­‐occurrence ▪ Very  popular  items  co-­‐occur  with  everything  (or  why  it’s  not   very  helpful  to  know  that  everybody  wants  a  pony…)   – ▪ Very  widespread  occurrence  is  not  interesting  as  a  way  to   generate  indicators     – ▪ Examples:  Welcome  document;  Elevator  music   Unless  you  want  to  offer  an  item  that  is  constantly  desired,  such  as  razor   blades  (or  ponies)   What  we  want  is  anomalous  co-­‐occurrence   – This  is  the  source  of  interesting  indicators  of  preference  on  which  to  base   recommendation ©MapR  Technologies  2013-­‐  Confidential 24
  • Get  Useful  Indicators  from  Behaviors Use  log  files  to  build  history  matrix  of  users  x  items   1. – Remember:  this  history  of  interactions  will  be  sparse  compared  to  all   potential  combinations   2. Transform  to  a  co-­‐occurrence  matrix  of  items  x  items   3. Look  for  useful  co-­‐occurrence  by  looking  for  anomalous  co-­‐ occurrences  to  make  an  indicator  matrix   – –  Log  Likelihood  Ratio  (LLR)  can  be  helpful  to  judge  which  co-­‐occurrences   can  with  confidence  be  used  as  indicators  of  preference   RowSimilarityJob in  Apache  Mahout  uses  LLR ©MapR  Technologies  2013-­‐  Confidential 25
  • Log  Files Alice Charles Charles Alice Alice Bob Bob ©MapR  Technologies  2013-­‐  Confidential 26
  • Log  Files u1 t1 u2 t4 u2 t3 u1 t2 u1 t3 u3 t3 u3 t1 ©MapR  Technologies  2013-­‐  Confidential 27
  • Log  Files  and  Dimensions u1 t1 u2 t4 u2 t3 Things u1 t2 t1 u1 t3 t2 u3 t3 u3 t1 ©MapR  Technologies  2013-­‐  Confidential Users u1 Alice u2 Charles u3 Bob 28 t3 t4
  • History  Matrix:  Users  by  Items Alice ✔ Bob ✔ Charles ©MapR  Technologies  2013-­‐  Confidential ✔ ✔ ✔ ✔ 29 ✔
  • Co-­‐occurrence  Matrix:  Items  by  Items 1 2 1 1 2 ©MapR  Technologies  2013-­‐  Confidential 1 0 -­‐ 0 1 1 30 0 0
  • Co-­‐occurrence  Matrix:  Items  by  Items How  do  you  tell  which  co-­‐occurrences  are  useful?. 1 2 1 1 2 ©MapR  Technologies  2013-­‐  Confidential 1 0 -­‐ 0 1 1 30 0 0
  • Co-­‐occurrence  Matrix:  Items  by  Items Use  LLR  test  to  turn  co-­‐occurrence  into  indicators… 1 2 1 1 2 ©MapR  Technologies  2013-­‐  Confidential 1 0 -­‐ 0 1 1 31 0 0
  • Co-­‐occurrence    Binary  Matrix not not ©MapR  Technologies  2013-­‐  Confidential 1 1 32 1
  • Spot  the  Anomaly What  conclusion  do  you  draw  from  each  situation? A not  A B 13 1000 not  B 1000 100,000 A not  A B 1 0 not  B 0 10,000 ©MapR  Technologies  2013-­‐  Confidential A B 1 0 not  B 0 2 A not  A B 10 0 not  B 33 not  A 0 100,000
  • Spot  the  Anomaly What  conclusion  do  you  draw  from  each  situation? A not  A B 13 1000 not  B 1000 100,000 A not  A B 1 0 not  B 0 10,000 0.90 4.52 A not  A B 1 0 not  B 0 2 A not  A B 10 0 not  B 0 100,000 1.95 14.3 ▪ Root  LLR  is  roughly  like  standard  deviations ▪ In  Apache  Mahout,  RowSimilarityJob uses  LLR ©MapR  Technologies  2013-­‐  Confidential 34
  • Co-­‐occurrence  Matrix Recap:  Use  LLR  test  to  turn  co-­‐occurrence  into  indicators 1 2 1 1 2 ©MapR  Technologies  2013-­‐  Confidential 1 0 -­‐ 0 1 1 35 0 0
  • Indicator  Matrix:  Anomalous  Co-­‐Occurrence Result:  The  marked  row  will  be  added  to  the  indicator  field   in  the  item  document…   ✔ ✔ ©MapR  Technologies  2013-­‐  Confidential 36
  • Indicator  Matrix ✔ id:  t4   title:  puppy   desc:  The  sweetest  little  puppy  ever.   keywords:  puppy,  dog,  pet   ! indicators:                  (t1)   ©MapR  Technologies  2013-­‐  Confidential 37
  • Indicator  Matrix That  one  row  from  indicator  matrix  becomes  the  indicator  field  in  the  Solr   document  used  to  deploy  the  recommendation  engine. ✔ id:  t4   title:  puppy   desc:  The  sweetest  little  puppy  ever.   keywords:  puppy,  dog,  pet   ! indicators:                  (t1)   ©MapR  Technologies  2013-­‐  Confidential 37
  • Indicator  Matrix That  one  row  from  indicator  matrix  becomes  the  indicator  field  in  the  Solr   document  used  to  deploy  the  recommendation  engine. ✔ id:  t4   title:  puppy   desc:  The  sweetest  little  puppy  ever.   keywords:  puppy,  dog,  pet   ! indicators:                  (t1)   ©MapR  Technologies  2013-­‐  Confidential 37
  • Indicator  Matrix That  one  row  from  indicator  matrix  becomes  the  indicator  field  in  the  Solr   document  used  to  deploy  the  recommendation  engine. ✔ id:  t4   title:  puppy   desc:  The  sweetest  little  puppy  ever.   keywords:  puppy,  dog,  pet   ! indicators:                  (t1)   Note:  data  for  the  indicator  field  is  added  directly  to  meta-­‐data  for  a  document  in   Solr  index.  You  don’t  need  to  create  a  separate  index  for  the  indicators. ©MapR  Technologies  2013-­‐  Confidential 37
  • Internals  of  the  Recommender  Engine 38 ©MapR  Technologies  2013-­‐  Confidential
  • Internals  of  the  Recommender  Engine 39 ©MapR  Technologies  2013-­‐  Confidential
  • Internals  of  the  Recommender  Engine 39 ©MapR  Technologies  2013-­‐  Confidential
  • Looking  Inside  LucidWorks
 ! Real-­‐time  recommendation  query  and  results:  Evaluation     What  to  recommend  if  new  user  listened  to  2122:  Fats  Domino  &  303:  Beatles?   ! 40 ©MapR  Technologies  2013-­‐  Confidential
  • Search-­‐based  Recommendations ▪ Sample  document   – – – – – Merchant  Id   Field  for  text  description   Phone   Address   Location ©MapR  Technologies  2013-­‐  Confidential 41
  • Search-­‐based  Recommendations ▪ Sample  document   – – – – – Merchant  Id   Field  for  text  description   Phone   Address   Location   ! – – – – – Indicator  merchant  id’s   Indicator  industry  (SIC)  id’s   Indicator  offers   Indicator  text   Local  top40 ©MapR  Technologies  2013-­‐  Confidential 42
  • Search-­‐based  Recommendations ▪ Sample  document   ▪ Sample  query   – – – – – – – – – Merchant  Id   Field  for  text  description   Phone   Address   Location   – ! – – – – – – Indicator  merchant  id’s   Indicator  industry  (SIC)  id’s   Indicator  offers   Indicator  text   Local  top40 ©MapR  Technologies  2013-­‐  Confidential 43 Current  location   Recent  merchant  descriptions   Recent  merchant  id’s   Recent  SIC  codes   Recent  accepted  offers   Local  top40
  • Search-­‐based  Recommendations ▪ Original  data   Sample  document   and  meta-­‐data – Merchant  Id   ▪ Sample  query   – – – – – – – – – Field  for  text  description   Phone   Address   Location   – ! – – – – – Current  location   Recent  merchant  descriptions   Recent  merchant  id’s   Recent  SIC  codes   Recent  accepted  offers   Local  top40 Indicator  merchant  id’s   Recommendation   Indicator  industry  (SIC)  id’s   query Indicator  offers   Indicator  text   Derived  from  cooccurrence   Local  top40 and  cross-­‐occurrence   analysis ©MapR  Technologies  2013-­‐  Confidential 44
  • For  example ▪ Users  enter  queries  (A)   – ▪ Users  view  videos  (B)   – ▪ (actor  =  user,  item=video)   ATA  gives  query  recommendation   – ▪ (actor  =  user,  item=query)     “did  you  mean  to  ask  for”   BTB  gives  video  recommendation   – “you  might  like  these  videos” ©MapR  Technologies  2013-­‐  Confidential 45
  • The  punch-­‐line ▪ BTA  recommends  videos  in  response  to  a  query   – – (isn’t  that  a  search  engine?)   (not  quite,  it  doesn’t  look  at  content  or  meta-­‐data) ©MapR  Technologies  2013-­‐  Confidential 46
  • Real-­‐life  example ▪ Query:  “Paco  de  Lucia” ▪ Conventional  meta-­‐data  search  results:   – – ▪ “hombres  de  paco”  times  400   not  much  else   Recommendation  based  search:   – – – Flamenco  guitar  and  dancers   Spanish  and  classical  guitar   Van  Halen  doing  a  classical/flamenco  riff   ©MapR  Technologies  2013-­‐  Confidential 47
  • Real-­‐life  example ©MapR  Technologies  2013-­‐  Confidential 48
  • Real-­‐life  example ©MapR  Technologies  2013-­‐  Confidential 48
  • Hypothetical  Example ▪ Want  a  navigational  ontology?   ▪ Just  put  labels  on  a  web  page  with  traffic   – ▪ Remember  viewing  history   – ▪ This  gives  B  =  users  x  items   Cross  recommend   – ▪ This  gives  A  =  users  x  label  clicks   B’A  =  label  to  item  mapping   After  several  users  click,  results  are  whatever  users  think  they   should  be ©MapR  Technologies  2013-­‐  Confidential 49
  • Nice.    But  we   can  do  better? ©MapR  Technologies  2013-­‐  Confidential 50
  • A  Quick  Simplification ▪ Users  who  do  h  (a  vector  of  things  a  user  has  done)   ! ! ▪ Ah A  translates  things  into  users Also  do  r T A (Ah) User-­‐centric  recommendations   (transpose  translates  back  to  things) (A A)h Item-­‐centric  recommendations   (change  the  order  of  operations) T ©MapR  Technologies  2013-­‐  Confidential 51
  • Symmetry  Gives  Cross  Recommentations (A A)h Conventional  recommendations   with  off-­‐line  learning ( Cross  recommendations T ) BT A h ©MapR  Technologies  2013-­‐  Confidential 52
  • things users ©MapR  Technologies  2013-­‐  Confidential A 53
  • thing   thing   type  1 type  2 users ©MapR  Technologies  2013-­‐  Confidential ! A A # 2 $ " 1 54
  • ! A " 1 ! A 2 # ! A1 A 2 # = % $ " $ % " ! =% % " ! r # ! % 1 &=% % r2 & % " $ " T T # A1 &! A1 T &" A2 $ A2 # $ # T T A1 A1 A1 A 2 & ATA1 ATA 2 & 2 2 $ # T T A1 A1 A1 A 2 &! h1 % T T A 2 A1 A 2 A 2 &% h2 $" ! h ! T # 1 T r1 = % A1 A1 A1 A 2 &% " $% h2 " ©MapR  Technologies  2013-­‐  Confidential 55 # & & $ # & & $
  • Bonus  Round:   ! When  worse  is   better ©MapR  Technologies  2013-­‐  Confidential 56
  • The  Winner  –  None  of  the  Above ▪ What  are  the  most  important  algorithmic  advances  in   recommendations  over  the  last  10  years?   ! ! 1.  Result  dithering         2.  Anti-­‐flood       ©MapR  Technologies  2013-­‐  Confidential 57
  • The  Real  Issues  After  First  Production ▪ Exploration   ▪ Diversity   ▪ Speed   ! ▪ Not  the  last  fraction  of  a  percent ©MapR  Technologies  2013-­‐  Confidential 58
  • Result  Dithering ▪ Dithering  is  used  to  re-­‐order  recommendation  results     – Re-­‐ordering  is  done  randomly   ! ! ▪ Dithering  is  guaranteed  to  make  off-­‐line  performance  worse   ! ! ▪ Dithering  also  has  a  near  perfect  record  of  making  actual   performance  much  better ©MapR  Technologies  2013-­‐  Confidential 59
  • Result  Dithering ▪ Dithering  is  used  to  re-­‐order  recommendation  results     – Re-­‐ordering  is  done  randomly   ! ! ▪ Dithering  is  guaranteed  to  make  off-­‐line  performance  worse   ! ! ▪ Dithering  also  has  a  near  perfect  record  of  making  actual   performance  much  better   ! “Made  more  difference  than  any  other  change” ©MapR  Technologies  2013-­‐  Confidential 60
  • Why  Dithering  Works Real-­‐time   recommender Log  Files Overnight   training ©MapR  Technologies  2013-­‐  Confidential 61
  • Exploring  The  Second  Page ©MapR  Technologies  2013-­‐  Confidential 62
  • Simple  Dithering  Algorithm ▪ Generate  synthetic  score  from  log  rank  plus  Gaussian   ! ! ▪ Pick  noise  scale  to  provide  desired  level  of  mixing   ! ! ▪ Typically     ! ! ▪ s= log r + N(0, logε) Δr ∝ε r ε ∈ [1.5, 3] Oh…  use  floor(t/T)  as  seed ©MapR  Technologies  2013-­‐  Confidential 63
  • Example  …    ε  =  2 1 1 1 1 1 1 1 2 2 3 11 1 ©MapR  Technologies  2013-­‐  Confidential 2 8 3 2 5 2 3 4 3 4 1 8 8 14 8 10 33 7 5 11 1 1 2 7 3 15 2 7 15 3 23 8 4 2 4 3 9 3 10 3 2 5 9 3 6 10 5 22 64 15 2 5 8 9 4 7 1 7 11 7 11 7 22 7 6 11 19 4 44 8 15 3 2 6 10 4 14 29 6 2 9 33 14 14 33
  • Lesson:   Exploration  is  good ©MapR  Technologies  2013-­‐  Confidential 65
  • Part  3:   What  about  that  worked   example? ©MapR  Technologies  2013-­‐  Confidential 66
  • http://bit.ly/18vbbaT ©MapR  Technologies  2013-­‐  Confidential 67
  • Analyze  with  Map-­‐Reduce Complete   history SolR   SolR   Solr   Indexer Indexer indexing Cooccurrence   (Mahout) Item  meta-­‐ data ©MapR  Technologies  2013-­‐  Confidential Index   shards 68
  • Deploy  with  Conventional  Search  System User   history SolR   SolR   Solr   Indexer Indexer search Web  tier Item  meta-­‐ data ©MapR  Technologies  2013-­‐  Confidential Index   shards 69
  • A  Quick  Simplification ▪ Users  who  do  h  (a  vector  of  things  a  user  has  done)   ! ! ▪ Ah A  translates  things  into  users Also  do  r T A (Ah) User-­‐centric  recommendations   (transpose  translates  back  to  things) (A A)h Item-­‐centric  recommendations   (change  the  order  of  operations) T ©MapR  Technologies  2013-­‐  Confidential 51
  • Me,  Us ▪ Ted  Dunning,  Chief  Application  Architect,  MapR   Committer  PMC  member,  Mahout,  Zookeeper,  Drill   Bought  the  beer  at  the  first  HUG   ! ▪ MapR   Distributes  more  open  source  components  for  Hadoop   Adds  major  technology  for  performance,  HA,  industry  standard  API’s   ! ▪ Info   Hash  tag  -­‐  #mapr   See  also  -­‐  @ApacheMahout  @ApacheDrill           @ted_dunning  and  @mapR ©MapR  Technologies  2013-­‐  Confidential 70