How Google News Recommendation Engine Works!! [Video]

I just finished watching a Great Tech-Talk on how Google news recommends relevant news items one might be interested in and
I thought it would be nice to share this video on my blog.

Please note the talk was given in 2008, but is quite informative None the less.

Some Takeaways:

They mainly use collaborative algorithms algorithms family for coming up with stories to recommend.
Using this not only can they leverage google’s large user base that they have to their advantage, but it also makes algorithms domain independent, so the same algorithms can also be used to some other domain like google video, etc.

They use weights on various applicable methods to come up with a net weighted score for  all the stories.

Some of the factors influencing news recommendation:

  1. Stories which users who have similar interest as yours have clicked [MinHash,PLSI]
  2. Stories that you have co-visited with the stories you clicked.

Recommendation are category specific, so your news clicks in entertainment section won’t affect items recommended in technology section.

Author also talks about how to make these algorithms scalable enough to run them effectively,  which is mostly achieved by using Map-Reduce , a well known distributed computing standard.

To people genuinely interested on this topic, I would suggest reading another great book on the subject – programming Collective Intelligence: Building Smart Web 2.0 Applications
A great read for the researchers and the newbies alike.

Posted in Technology | Tagged , , , , , , | Leave a comment

PostgreSQL tips and tricks !! (Part 1)

For a long time I wanted to have an online catalog of various sql nifty queries and hacks that i have found helpful and continue to use till date .
Many times  I got help from various blogs written on this topic.
Now It’s my time to give back something to the community, I learned so much from :).

This post is most useful for people who have intermediate to advanced knowledge of postgresql/SQL.

Note : Please use these queries at your own risk, hopefully it should help you get your work done without any hiccups 🙂

1) Using query aliases in table joins for better performance:
While doing a join between two (or more ) tables, using aliases gives gives performance gain over ordinary joins in certain cases. Like for ex.
Suppose there are two tables having Bank information (bankinfo) and Transaction Information (trinfo) , They share a common column named IFSC number.
I need Bank Information,and total number of transactions present for each bank. Now I can write needed sql query in two ways.

select,trinfo.ifsc,count(*) from bankinfo left outer join trinfo on bankinfo.ifsc=trinfo.ifsc group by,trinfo.ifsc;

Or using query aliases

select,q.count from bankinfo left outer join (select ifsc,count from trinfo group by ifsc)q on bankinfo.ifsc = q.ifsc;

You can see that we are reducing rows that would need to be joined, which will most of the times will result in improved performance.This mostly works if join involves one of the tables which is relatively large in size or if you are using group by clauses.

Continue reading

Posted in Hacks, Technology | Tagged , , , , , | 1 Comment

Python Wrapper for Google Translate AJAX API.

Few Days Ago, i needed some Non English documents to be translated to English language, for which Google AJAX Language API can be used.

I wrote a small script which got the work done ,and why put it to waste when there is a chance that it can be used by others.

So without further ado, here is the code :), Feel free to use it in any commercial/open source projects

 #By Ashish Yadav, released under WTFPL
 # -*- coding: utf-8 -*-

 import sys
 import urllib
 import simplejson

 baseUrl = ""

 def getSplits(text,splitLength=4500):
     Translate Api has a limit on length of text(4500 characters) that can be translated at once, 
     return (text[index:index+splitLength] for index in xrange(0,len(text),splitLength))

 def translate(text,src=' ', to='en'):
     A Python Wrapper for Google AJAX Language API:
     * Uses Google Language Detection, in cases source language is not provided with the source text
     * Splits up text if it's longer then 4500 characters, as a limit put up by the API

     params = ({'langpair': '%s|%s' % (src, to),
              'v': '1.0'
     for text in getSplits(text):
             params['q'] = text
             resp = simplejson.load(urllib.urlopen('%s' % (baseUrl), data = urllib.urlencode(params)))
                     retText += resp['responseData']['translatedText']
     return retText

 def test():
     msg = "      Write something You want to be translated to English,\n"\
         "      Enter ctrl+c to exit"
     print msg
     while True:
         text = raw_input('#>  ')
         retText = translate(text)
         print retText

 if __name__=='__main__':
     except KeyboardInterrupt:
         print "\n"

Python Code Formatting is done using the awesome Python2Html utility

Also present as a recipe on Activestate

Posted in Uncategorized | Tagged , , | 1 Comment

Install VlC 1.0.0 on Ubuntu Intrepid 8.10

Hi Linux Enthusiasts,

VLC 1.0.0 is out now and here is in short how to install this version on ubuntu intrepid in 4 steps.

1)Add Following lines to /etc/apt/sources.list

deb intrepid main
deb-src intrepid main

2)sudo apt-key adv –recv-keys –keyserver 1501599E

3)sudo apt-get update

4)sudo apt-get install vlc vlc-plugin-esd mozilla-plugin-vlc

Posted in Uncategorized | Tagged , , | 4 Comments

Up – Official Trailer [HD]

A awesome movie from Pixar

Posted in Uncategorized | Tagged , , | Leave a comment

Tim Berners-Lee on the next Web

A great talk by Sir Tim Berners Lee on Linked data,

Posted in Uncategorized | Tagged , , | Leave a comment

Time-Lapse Photography Captures Galactic Core of the Milky Way

Posted in Uncategorized | Tagged , , , | Leave a comment