|
Extracting document content through Text Visualization
In collaboration with a leading advertising company, we
are developing a Hunch Engine™-based tool to analyze the
meaning of surveys, blogs, or newsgroups, without any
prior knowledge of the content of the text.
To do so, not knowing ahead of time what will be
discovered in the data, the user starts with a range of
perspectives on the data, such as, for example, various
networks of concepts that appear connected in the data;
each network differs by the word it centers on and by the
nature of network connections (see figure below).
The user examines each of 6 representations presented
and selects the ones that seem to provide useful
information by clicking on them. The user can also use
drag-and-drop features and other modes of interaction to
more directly combine and alter representations. Once a
meaningful representation has been found, the user can dig
into it.
In the example shown above, the user is exploring a large
corpus of newsgroup postings all related to healthcare
issues. One area of particular interest to the user in
this case is the relationships and alliances in the
pharmaceutical industry and how they are being discussed
by the newsgroup participants (right figure).
It is important to note that the same process can
operate with most languages, provided the sentences can be
parsed into words or other "atomic" units.
|