WikinMongo

Simple node.js script to import Wikipedia XML dump into MongoDB database.

WikinMogo 0.1.0

Overview

This is just a simple node.js script to import Wikipedia XML dump into MongoDB database.

Environment

Data Source

Wikipedia XML dump file (uncompressed)
http://dumps.wikimedia.org

Page Document Structure

{
	title: string,
	ns: string,
	id: number,
	revision: {
	    id: number,
	    parentid: number,
	    timestamp: date,
	    contributor: {
	        username: string,
	        id: number,
	        ip: string
	    },
	    comment: string,
	    text: string,
	    sha1: string,
	    model: string,
	    format: string
	}
}

Usage

node app.js db dump drop

Arguments:

db:   MongoDB database
dump: Wikipedia dump XML file (uncomressed)
drop: Drop pages collection (if exists) before insterting new documents

Example: node app.js 'mongodb://localhost:27017/wiki' '/media/Data/enwiki.xml' drop

Notes

License

This project is BSD (2 clause) licensed.