Some uncommon ‘z’ words.

sqlite> select count,word,pos from words where count = 1 and word like “z%” order by word limit 100;
1|Zamora|NNP
1|Zandvoort|NNP
1|Zanzibar|JJ
1|Zapatero|NNP
1|Zarko|NNP
1|Zavier|NNP
1|Zawahri|NNP
1|Zealand’|NNP
1|Zealand-Pacific|NNP
1|Zealand-Samoan|NNP
1|Zealand-registered|NNP
1|Zebon|NNP
1|Zeev|NNP
1|Zehi|NNP
1|Zeppelin|NNP
1|Zero’|NNP
1|Zhanjiang|NNP
1|Zhao|NNP
1|Zhejiang|NNP
1|Zhengrong|NNP
1|Zhengzhou|NNP
1|Zhenli|NNP
1|Zhenxing|NNP
1|Zhisheng|NNP
1|Zhou|NNP
1|Zhu|NNP
1|Zigzag’|NNP
1|Zille|NNP
1|Zimbabwean-controlled|NNP
1|Zion|NNP
1|Zip|NNP
1|Zlin|NNP
1|Zobaie|NNP
1|Zodiac|NNP
1|Zookeepers|NNP
1|Zubiarre|NNP
1|Zugna|NNP
1|Zumino|NNP
1|zapped|VBD
1|zeal|NN
1|zealot|NN
1|zero-carbon|JJ
1|zero-tolerance|JJ
1|zeroed|VBD
1|zip|NN
1|zodiac|NN
1|zombies|NNS
1|zones’|NN
1|zookeepers|NN
1|zorse|NN
sqlite>

Current top twenty proper nouns.

sqlite> select count,phrase from phrases where phrase like “%NNP%” order by count desc limit 20;
1477|Police/NNP
1218|Australia/NNP
1096|Sydney/NNP
1037|Iraq/NNP
720|New/NNP South/NNP Wales/NNP
683|Federal/NNP Government/NNP
541|United/NNP States/NNPS
509|Government/NNP
494|Prime/NNP Minister/NNP John/NNP Howard/NNP
439|New/NNP South/NNP Wales/NNP Government/NNP
350|New/NNP Zealand/NNP
346|Iran/NNP
339|LONDON/NNP
292|Baghdad/NNP
289|Afghanistan/NNP
286|Melbourne/NNP
278|China/NNP
267|Britain/NNP
262|Israel/NNP
262|Auckland/NNP

Current source feeds.

cursor.execute(‘INSERT INTO “sources” VALUES(2, “ABC News: Breaking Stories”, “http://abc.net.au/news/syndicate/breakingrss.xml”, 1)’)
cursor.execute(‘INSERT INTO “sources” VALUES(3, “ABC News: World”, “http://abc.net.au/news/syndicate/worldrss.xml”, 1)’)
cursor.execute(‘INSERT INTO “sources” VALUES(4, “ABC News: New South Wales”, “http://www.abc.net.au/xmlcontent/indexes/nsw/NSW_rss_index.xml”, 1)’)
cursor.execute(‘INSERT INTO “sources” VALUES(5, “ABC News: Top Stories”, “http://abc.net.au/news/syndicate/topstoriesrss.xml”, 1)’)
cursor.execute(‘INSERT INTO “sources” VALUES(6, “Crikey RSS”, “http://www.crikey.com.au/rss.xml”, 1)’)
cursor.execute(‘INSERT INTO “sources” VALUES(7, ” the Mail online | World news”, “http://feeds.feedburner.com/dailymail/worldnews”, 1)’)
cursor.execute(‘INSERT INTO “sources” VALUES(8, “NYT > Middle East”, “http://www.nytimes.com/services/xml/rss/nyt/MiddleEast.xml”, 1)’)
cursor.execute(‘INSERT INTO “sources” VALUES(9, “NYT > International”, “http://www.nytimes.com/services/xml/rss/nyt/International.xml”, 1)’)
cursor.execute(‘INSERT INTO “sources” VALUES(10, “New Zealand Herald – World”, “http://syndication.apn.co.nz/rss/nzhrsscid_000000002.xml”, 1)’)
cursor.execute(‘INSERT INTO “sources” VALUES(11, “New Zealand Herald – National”, “http://syndication.apn.co.nz/rss/nzhrsscid_000000001.xml”, 1)’)
cursor.execute(‘INSERT INTO “sources” VALUES(12, “The Sydney Morning Herald World Headlines”, “http://feeds.smh.com.au/rssheadlines/world.xml”, 1)’)
cursor.execute(‘INSERT INTO “sources” VALUES(13, “The Sydney Morning Herald National Headlines”, “http://feeds.smh.com.au/rssheadlines/national.xml”, 1)’)

Current database structure

cursor.execute(‘CREATE TABLE articles (id INTEGER PRIMARY KEY, title VARCHAR(50) UNIQUE, content VARCHAR(1000), date DATE, source_id INTEGER, phrase_id VARCHAR(100))’)
cursor.execute(‘CREATE TABLE sources (id INTEGER PRIMARY KEY, title VARCHAR(50), url VARCHAR(100), weight INTEGER)’)
cursor.execute(‘CREATE TABLE phrases (id INTEGER PRIMARY KEY, phrase TEXT UNIQUE, pos VARCHAR(5), rating varchar(100), count INTEGER, words varchar(50))’)
cursor.execute(‘CREATE TABLE score (date DATE PRIMARY KEY, score INTEGER)’)
cursor.execute(‘CREATE TABLE words (id INTEGER PRIMARY KEY, wordtagged TEXT UNIQUE, word TEXT, pos VARCHAR(5), count INTEGER, q1 TEXT, q2 TEXT, q3 TEXT, q4 TEXT, q5 TEXT, q6 TEXT, q7 TEXT, q8 TEXT)’)

Data structure

Table of words
Word/Phrase
Index ( smallint )
Usage/definition
Examples

Table of results
Word/Phrase index ( smallint )
Number of results ( mediumint )
Aggregate score per affect index ( DOUBLE(9,2) UNSIGNED ) up to 1 billion to 2 decimal places

Table of affects
Affect ( varchar )
Affect id ( tinyint )

Table of voters
IP address who have voted
array of Word/Phrase index voted for ( smallint )