Information World Review (IWR) Blog Information World Review (IWR) Blog A blog from www.iwr.co.uk

« Information Retrieval Symposium opens | Main | The dark side of social networking »

Mind the Language Gap

One of this morning's IRFS Language Gap sessions asks the question of why we need a cross lingual patent retrieval system in an Asian language. Well for a start, consider that three of the top five patent filing countries hail from the region; Japan is first, followed by China in third and the Republic of Korea in fourth. The US and Europe take second and third places respectively. 

If you operate in the patent world then sooner or later you will probably have to work with filings from Asian origins. The question is how, seeing as there are some fundamental structural differences between Western Latin-based languages and their Asian counterparts. Among numerous examples, there are for instance six varieties of expression for the colour red in Korean, how a word is spaced in Chinese can also heavily affect its meaning and translation to English considerably. When applying this to the ambiguous nature of legally constructed patent documents, the challenge is considerable.

Relying on just human translation is not an option, in part due to the sheer volume of documents constantly being filed as well as existing material.

Minah Kim from the Korean Institute of Patent Information has been explaining how their cross lingual retrieval system copes with the issues but also what still needs to be done.

She called for efforts to improve quality, such as a semantically based query expansion, whereby each word in an original search is expanded to a related search term such as boat to ship to vessel to water. Time spent on a query is also an issue that needs improvement, with the average amount spent on one document retrieval being 10 seconds; that can cause problems with a 200 page document, never mind the rest.

For the short term, the plan is to consistently upgrade the systems dictionary and establish the query expansion by the end of this year. In the longer term, Kim says there must be the development of a cross lingual retrieval system for Chinese, Japanese and Korean patent documents. Addressing the language translation issues with similar models is something that Western organisations can also only benefit from.

Comments

Post a comment

Bloggers-in-chief

Daniel Griffin, IWR Deputy Editor Daniel Griffin, IWR Deputy Editor
Daniel joined IWR in 2006 after a career as a publisher of guides, supplements and websites for magazine and event companies. His special interest is the evolving publishing and information industry online.

Peter Williams, IWR Editor Peter Williams, IWR Editor
Peter is in his second spell on IWR. Over the last few years he has developed interest in the fields of knowledge management and e-learning, writing and editing extensively on both topics.


Recent Comments

Powered by Movable Type
Useful links: About | Privacy policy | Terms & conditions | Top of the page
© Incisive Media Ltd. 2008
Incisive Media Limited, Haymarket House, 28-29 Haymarket, London SW1Y 4RX, is a company registered in the United Kingdom with company registration number 04038503