• Home
  • RSS Feed
  • Log in

Archive for the ‘Hadoop’ Category


Wiki PageRank with Hadoop
Posted by abij just before lunchtime: September 27th, 2011

In this tutorial we are going to create a PageRanking for Wikipedia with the use of Hadoop. This was a good hands-on excercise to get started with Hadoop. The page ranking is not a new thing, but a suitable usecase and way cooler than a word counter! The Wikipedia (en) has 3.7M articles at the moment and is still growing. Each article has many links to other articles. With those incomming and outgoing links we can determine which page is more important than others, which basically is what PageRanking does.
(more…)

Share

Filed under Hadoop, Java, NoSQL | 7 Comments »


Xebia Sites

  • Xebia Corporate
  • Xebia France
  • Xebia India
  • Xebia Sweden

Categories

  • Java (311)
  • Agile (181)
  • General (136)
  • Scrum (67)
  • Architecture (64)
  • Testing (59)
  • Performance (46)
  • Middleware (56)
    • Deployment (38)
  • Xebia Labs (39)
  • SOA (31)
  • Podcast (31)
  • Project Management (28)
  • Tools (26)
  • Uncategorized (20)
  • lean architecture (20)
  • Quality Assurance (17)
  • Articles (13)
  • Requirements Management (13)
  • Virtualization (19)

Tag Cloud

    Java Oracle Scala TDD Grails agile architectuur Frameworks Xebia JPA implementation patterns Spring XML Moving to India Groovy Maven Lean SOA JPA Eclipse ACT Scrum Agile Hibernate Javascript Concurrency Control Ajax lean architectuur Flex Architecture lean architecture product owner

Archives

  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • March 2011
Avatars by Sterling Adventures