Take a stroll through today’s blogosphere. Subscribe to a few tweets, check out some Facebook and MySpace pages, and see how much data you get exposed to, and that’s in addition to all the “regular” websites you pop in on. Everybody has become an author, editor and publisher AND is still somehow convinced that there will be time to be a reader, and do some thinking too.
The time investment required to get to the nuggets hidden in the long tail will just get bigger, so much so that most of the tail will remain hidden but for a small circle of followers. Sure, you can Google key words for your topic of interest. Add some persistence and some “Internet research skills” and you can find a lot of relevant material. But this just points to the fact that the present version of “search” is still a very crude tool, and that we are on a collision course with a mass of data that will result in frustration, indifference, or both. That would be a loss for everyone. What’s the way forward?
What we need is a much more intelligent filtering mechanism: something like the software version of a personal librarian or a dedicated research staff.
We have lift-off
We see the first glimmer of this in RSS. You can populate your personal homepage with data sources that you select and that are constantly refreshed. But that’s just data aggregation with a dynamic twist. There are no brains in it beyond your selection of sources. If a different source comes up with an article on a topic you are trying to keep up on, you will only get to know about it if you perform your own research, or if a friend sends you the link. As the number of data sources grows with ever more contributors, there is a great chance that great ideas will be lost in the noise.
The Intelligent Agent
What we need is a software agent that intelligently, dynamically and semi-autonomously generates an information dashboard on any topic we choose to “search”. The user will roughly configure the agent’s search parameters. The agent will then produce the data dashboard according to the preferences set by the user for:
- Type of information source (website, blog, news site, university site, company site, etc)
- Amount of information – We need to be able to get a “casual” data collection for our hobbies, for example, or much more formal data collection for academic research;
- Prioritization of information – the user should be able to choose the structure of the data pyramid that the agent produces; the pyramid will result in data that goes from the general to the more specialized, just like any good bibliography. We should be able to define the height and the base of the pyramid.
Of course, the agent should be graphically configurable through the manipulation of objects – Wordpress for example does that already quite well in its dashboard. We should be able to browse through it efficiently on a number of devices.
Training the Agent
The agent needs to apply intelligence regarding each piece of data it considers to reference in your “dashboard”. (We won’t call it AI because, somehow AI got a bad name.) The intelligence will be provided by each user through “training” provided to his/her agentĀ in the form of archetypes of the types of data that are acceptable. The user could also define flexible or “fuzzy” parameters that could be the key to making sure that the agent does include some “long tail” items that would not usually make the cut in a more strictly defined search. We can also imagine an online venue where these different customized filters for various topics could be made downloadable or come to be traded. Some of them may become important IP and a strategic advantage for companies.
The progress of Intelligence embedded in our tools needs to match the rate of information creation or the sum of our newly expressed collective thinking will not create more value for anyone. It may sound like science fiction now but what are the alternatives?