A Site or Portal for Natural Language Processing Tools
While at the AAAI Fall Symposium this week, it occurred to me that there may not be a really good, reasonably complete, business oriented open source site for tools that businesses, especially small businesses, can use in natural language processing applications.
To be useful to small businesses, the tools would need to be well written, well documented, and constructed in a way that enabled straightforward interface connections between the various parts. I started making a list of possible component tools that might be included in such a list. The list is a draft list of course, but hopefully some of you readers out there will help out and give me some additional, maybe even better ideas.
Draft List of Possible NLP Components in an Open Source Portal for Business
- Parser
- Grammar - context free or context sensitive
- Grammar editing tool
- Training text corpus
- Front end user interface
- Synonym dictionary and editor
- Ontology and editor
- Tutorials and documentation for integrating into applications
So far, I have looked around at several sites including NLUC, OpenNLP, and several wikis and university sites. Several offer some very useful material, but none seemed to offer the “one stop shop” that I had in mind.
If you have some ideas, please leave a comment or send me a note. If I can’t find something within a few weeks, then I will start ginning one up.
-Stu
November 13th, 2007 at 10:11 am
From: Stephen Reed [mailto:stephenreed@yahoo.com]
Sent: Tuesday, November 13, 2007 9:59 AM
To: Stu
Subject: Re: Penny for your thoughts
Stu,
I would add to your list the following components:
Named Entity Recognition tool, Gazetteers
Information Extraction tool (extracts related entities from text)
Machine Translation tool
Natural Language Dialog front-end tool
Best wishes with the portal. Please feel free to post any of my software or RDF files there, or to link to the texai download page on SourceForge.
-Steve
November 13th, 2007 at 1:47 pm
Another name for “Named Entity” is “Onomasticon”. This is the term Sergei Nirenburg uses to describe the “Named Entity Lexicon” in his NLP system.
A tool that I would very much like to see available is a “Picture-Word” dictionary. We are doing some research in this area with Steve Helmreich and Jim Cowie from New Mexico State University. The idea is to generate a “pictorial representation of meaning” from linguistic input. The key challenge is to figure out how to compose pictures together to generate such representations. For a very simple instance, how to compose a picture for an expression like “the red ball” without having to pre-store all possible combinations of objects and colors.
November 16th, 2007 at 5:53 pm
I would add visualizers (e.g. for parsers’ results)
Also, did you have a chance to look at ACLWiki’s ‘State of the art’ section? http://aclweb.org/aclwiki/index.php?title=State_of_the_art
I feel that your idea is great. I subscribed to your blog to track how it goes. However, I also feel that unless it is going to be in a visible place, it will not be found and may even suffer the same fate as other very interesting, but barely noticed projects.
Having it on a central WIKI (same section or a different one) would make it more visible and therefore easier for it to succeed and stay current.
November 16th, 2007 at 10:34 pm
Alexandre,
Thanks for the comment. I had not previously seen ACLWeb’s wiki. It looks good.
I have also found that aaai.org keeps a fairly robust set of introductory information and tutorials. Theirs is complete enough that I think another site may not be warranted.
It may be better to join with one of those sites and help them improve it.
Stu