• Home
  • RSS Feed
  • Log in

Author Archive


Wiki PageRank with Hadoop
Posted by abij just before lunchtime: September 27th, 2011

In this tutorial we are going to create a PageRanking for Wikipedia with the use of Hadoop. This was a good hands-on excercise to get started with Hadoop. The page ranking is not a new thing, but a suitable usecase and way cooler than a word counter! The Wikipedia (en) has 3.7M articles at the moment and is still growing. Each article has many links to other articles. With those incomming and outgoing links we can determine which page is more important than others, which basically is what PageRanking does.
(more…)

Share

Filed under Hadoop, Java, NoSQL | 7 Comments »


Xebia Sites

  • Xebia Corporate
  • Xebia France
  • Xebia India
  • Xebia Sweden

Categories

  • Java (311)
  • Agile (181)
  • General (136)
  • Scrum (67)
  • Architecture (64)
  • Testing (59)
  • Performance (46)
  • Middleware (56)
    • Deployment (38)
  • Xebia Labs (39)
  • SOA (31)
  • Podcast (31)
  • Project Management (28)
  • Tools (26)
  • Uncategorized (20)
  • lean architecture (20)
  • Quality Assurance (17)
  • Articles (13)
  • Requirements Management (13)
  • Virtualization (19)

Tag Cloud

    Groovy Architecture JPA implementation patterns Xebia SOA JPA Spring Ajax TDD Concurrency Control Agile Maven XML Java Eclipse ACT Javascript lean architecture product owner Frameworks lean architectuur Oracle Flex Moving to India Scrum Grails Lean Scala agile architectuur Hibernate

Archives

  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • March 2011
Avatars by Sterling Adventures