Lucene6.0对查询分词结果的方法做了一些细微的调整,早期lucene的实现方式:
public void analyzeDemo(Analyzer analyzer, String text) throws Exception { TokenStream tokenStream = analyzer.tokenStream("content", new StringReader(text)); for (Token token = new Token(); (token = tokenStream.next(token)) != null;) { System.out.println(token); }}
最新版lucene的实现方式:
public class AnalyzeDemo { /** * 打印分词结果 * @param analyzer * @param text */ public void analyze(Analyzer analyzer, String text) { try { TokenStream tokenStream = analyzer.tokenStream("content", new StringReader(text)); tokenStream.addAttribute(CharTermAttribute.class); tokenStream.reset(); while (tokenStream.incrementToken()) { CharTermAttribute charTermAttribute = (CharTermAttribute) tokenStream .getAttribute(CharTermAttribute.class); System.out.println(charTermAttribute.toString()); } tokenStream.end(); tokenStream.close(); } catch (Exception e) { e.printStackTrace(); } } public static void main(String[] args) { String text = "2小时前 - 谈起对中国人none的认同,侯汉廷认为,这与家庭和小时候的教育有很大关系。"; Analyzer analyzer = new SmartChineseAnalyzer(); AnalyzeDemo demo = new AnalyzeDemo(); demo.analyze(analyzer, text); }}
通过更换分词器,比较相应分词器的分词效果,选择最佳分词器。