Saturday, December 28, 2013

Structured database of everything

In May 2011 I wrote publicly about QB for the first time, into a forum of Czech on-line magazine Lupa.cz, which focuses to internet. I wrote the post in my native Czech, so here I translated it for you to ejnoy:

Hello,

I have designed a universal system for structured data management to fulfill my frequent need for quickly retrieving specific information on the go. Services like Wikipedia don't suit me due to their complex nature, lack of data availability for the Czech Republic, or reluctance to record "every little thing."

The basic implementation can be found at http://q.q3x.net/ and besides the data database, it includes a few useful tools (unit and currency converter, whois, distances, etc.) that are further utilized by the system itself. For example, the data is stored in standard units and converted according to the selected language (e.g., English uses dollars, miles, ounces, etc.) when displayed, and additional calculations are performed if needed (population density, for instance). Instead of a traditional API, data can be obtained by selecting a rendering module for a specific format (e.g., TSV).

Personally, I utilize information about aircraft based on their registration (typically age), information about people (typically age), and places (typically population count). The system can also record the content of a library or books read with ratings, the same for movies, TV series, or music. Products can be found using the EAN, and if data is available, it shows where the product is cheapest nearby, how far it is, and how quickly I need to go to catch the bus heading there. The system can even handle a complete CMDB (Configuration Management Database); that's why it was designed in a decentralized manner with the option for interconnection.

The ultimate goal is to synthesize Wikipedia, Freebase, Wikia, WolframAlpha, along with local databases and something extra. There is no underlying business plan; it's not meant to be a mean to get rich but primarily for entertainment, relaxation, and an opportunity to avoid stagnation. However, I don't want to be the only one benefiting from the system, so I welcome questions, opinions, advice, feedback, and criticism... Thank you.

I'm aware that data is the alpha and omega, and I gather it in every possible way. Ideally, crowdsourcing would be the solution, but without a "crowd," it functions poorly. So please take the data with a grain of salt. Essentially, they are mainly for testing purposes, to evaluate performance and gradually refine the concept. And finally, I apologize in advance for any parse errors; it's a work in progress.

> Quote: Petr Hejl 25th May 2011, 18:57:27

> The plan is quite ambitious. The problem will be with the data because each database or website has almost a different format. I don't want to discourage you, but WolframAlpha has tried this and failed miserably. Unless you invent a universal parser...

I don't assume I would crawl the web with a crawler and extract structured data. That never generates a reasonable level of data cleanliness. In the beginning, I tried to write a DOM parser for Wikipedia, and even on a single project, there are so many exceptions that I eventually gave up. So the assumption is that data is populated with pre-prepared batches (there are tons of them online, typically XLS or tables in PDF) or manually (using suitable tools, which is not such a hassle).

Honestly, this whole endeavor was initially provoked by disappointment with the level of data on the internet, aiming to create a purely data-oriented platform with solid boundaries. I built a fully customizable application on top of the data model, which formed the basis of this entire concept. So from the very beginning, I assume that the primary source of data will be a keyboard or defined formats (VCF, GPX, XML, XLS, etc.) for which I already have import algorithms.

> Quote: Ondra 25th May 2011, 20:18:37

> 1) Not everyone is a geek (Is it for "normal people"? ;-)

> 2) http://www.uoou.cz

No, it's not :) Currently, I'm working on adapting it for "normal people," but I'm already quite distorted...

How does it differ from similar systems? I have given a lot of thought to personal data, but the information I worked with regarding this topic didn't contradict the current content. If you have more specific information, I would appreciate it and take the necessary steps. Otherwise, the data comes from publicly available sources.

Monday, December 9, 2013

Sliky Quiky

Well, I got many brilliant ideas for Quiky past few days. Most of them stays as TODO, to release first public version as soon as possible, but among them I got few killer features I can’t wait to implement.

I tried Quiky in real life use case yesterday. I attended yet another great event, DevFest Praha 2013 by GUG CZ, and some of my notes I made using Quiky. Right there I figured out one of the killer features I don’t want to write about yet, because I’ve never seen such feature anywhere else.

I also found I quite miss spell check, because I tend to use other text editors for this. Therefore I implemented NHunspell, but shortly after that I realized it would be nice to underline misspelled words. Quiky used plain multiline TextBox, which can’t underline anything. I already had RichTextBox, but for future WYSIWYG only. After about a second of thinking about it I got rid of the plain TextBox and kept solely the RichTextBox.

I was quite excited about the “switcheroo”, because I didn’t like the fact I have two different components for the same purpose.

Web side also got a lot of attention in the last few days. I focused on security and made few independent layers, especially when accessed from outside using Quiky API (like Quiky app).

My goal is to unify GUI for all platforms. Web will be responsive and it’s not just for mobile usage. I still like Firefox’s concept of a Sidebar, so I may be using Quiky there as well.