solr/lucene，

和通数据库htsjk.Com2019-11-19 22:59 来源:未知阅读:7918 评论 187 热度2

标签：

solr/lucene，

本人系原创，转载请注明出处！！

关于solr 如何整合到服务器，lucene倒排的基本原理这些网上很多就不赘述，想要实现的结果为：用solr根据数据库建立索引，用lucene查找索引的简易实战。

下面上一些截图

1：数据库，一张表，3字段 did主键

2：solr配置文件目录结构（可以从下载好的solr-src里面直接拷贝D:\javas\solr-4.7.2\example\solr\collection1\conf 里面的）

schema主要配置数据库内容

<?xml version="1.0" encoding="UTF-8" ?>

<schema name="demoschema" version="1.5">
 
 <types>

<fieldType name="long" class="solr.LongField" omitNorms="true"/>

        <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
        <fieldType name="integer" class="solr.IntField" omitNorms="true"/>
         <fieldType name="text_general" class="solr.TextField">
                       <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>        

</fieldType>
 </types>
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">    
    <lst name="defaults">    
      <str name="config">data-config.xml</str>          
    </lst>    
  </requestHandler> 

<fields>
   <field name="text" type="text_general" stored="false" indexed="true"  multiValued="true"/>
   <field name="did" type="integer" indexed="true" stored="true" required="true" />

<field name="dname" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="dcontent" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>
 </fields>

 <uniqueKey>did</uniqueKey>

 
 <defaultSearchField>text</defaultSearchField>



   <copyField source="dname" dest="text"/>
   <copyField source="dcontent" dest="text"/>

 
</schema>
..

主要说明下

 <copyField source="dname" dest="text"/> 这句话的作用是 我们在使用建立的索引的时候，需要查询多个字段的时候，比如这里的dname,dcontent都可以放到text这个域里面，

<pre name="code" class="html"> <fieldType name="text_general" class="solr.TextField">
                       <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>

检索的字段需要分词器进行分词

A、准备以下jar包

apache-solr-dataimporthandler-4.0.0.jar

apache-solr-dataimporthandler-extras-4.0.0.jar

apache-solr-dataimportscheduler-1.1.jar(增量导入使用)

数据库对应的jdbc驱动包这里使用的是Oracle oracle10g.ja放入Tomcat6.0.36/webapps/sol/WEB-INF/lib中

B、配置solrconfig.xml

在solrconfig.xml中加入如下配置：

<requestHandlername="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">

<strname="config">xx-data-config.xml</str>

</lst>

</requestHandler>

C、配置数据源

在与solrconfig.xml文件同级的目录中建立上述配置中的xx-data-config.xml文件，配置如下

query属性为全导入的时候使用。其他为增量导入使用。

<?xml version="1.0" encoding="UTF-8" ?>

<dataSourcetype="JdbcDataSource"

driver="oracle.jdbc.driver.OracleDriver"

url="jdbc:oracle:thin:@192.168.0.129:1521:orcl"

user="username"

password="password"/>

<entityname="business_info" pk="ID"

query="select t.IDid,business_name,bussiness_type from business t"

deltaImportQuery="selectt.ID id,business_name,bussiness_type from business t whereid='${dataimporter.delta.ID}'"

deltaQuery="selectt.ID id,business_name,bussiness_type from business t where to_char(updatetime,'yyyy-mm-ddhh24:mi:ss')> '${dataimporter.last_index_time}'">

<fieldcolumn="ID" name="id"/>

</entity>

</document>

</dataConfig>

至此所有DIH的配置完成，在浏览器中输入命令：

全导入：

http://localhost:8085/solr/core0/dataimport?command=full-import&commit=ture

增量导入：

http://localhost:8085/solr/core0/dataimport?command=delta-import&clean=false&commit=ture

查看导入状态

http://localhost:8085/solr/core0/dataimport?command=status

成功！

Lucene查询：

  */  
    private String indexPath = "D:\\javas\\solr-home\\new_core2\\data\\index";  
    //D:\javas\solr-home\new_core2\data\index
  //private String indexPath = "F:\\luceneIndex"; 
    /** 
     * 分词器，这里我们使用默认的分词器,标准分析器（好几个，但对中文的支持都不好） 
     */  
    private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_4_10_2);

<pre name="code" class="html">/** 
     * 搜索 
     *  
     * @param queryStr 
     *            搜索的关键词 
     * @throws Exception 
     */  
    public void search(String queryStr) throws Exception {  
  
        // 1、把要搜索的文本解析为Query对象  
        // 指定在哪些字段查询  
        String[] fields = { "dname", "dcontent" };  
        // QueryParser: 是一个解析用户输入的工具，可以通过扫描用户输入的字符串，生成Query对象。  
        QueryParser queryParser = new MultiFieldQueryParser(Version.LUCENE_4_10_2 , 
                fields, analyzer);  
        // Query:查询，lucene中支持模糊查询，语义查询，短语查询，组合查询等等,如有TermQuery,BooleanQuery,RangeQuery,WildcardQuery等一些类。  
        Query query = queryParser.parse(queryStr);  
        
    /*    //boolean 查询
        BooleanQuery booleanQuery=new BooleanQuery();
        String[] words=queryStr.split(" ");
        TermQuery termquery = null;
        TermQuery termquery2 = null;
        

        	 termquery=new TermQuery((new Term("content",words[0])));
        	 termquery2=new TermQuery((new Term("content",words[1])));

       
        
        booleanQuery.add(termquery, BooleanClause.Occur.MUST);
        booleanQuery.add(termquery2, BooleanClause.Occur.MUST_NOT);
        */
        
        
   /*     booleanQuery.add(BooleanClause.Occur.SHOULD);*/
        
   /*    //短语查询测试----success
        PhraseQuery phrasequery=new PhraseQuery();
        phrasequery.setSlop(0);
        String[] words=queryStr.split(" ");
        for (int i=0;i<words.length;i++){

        	phrasequery.add(new Term("content",words[i]));

        }
       */
        
        // 2、进行查询  
        File indexFile = new File(indexPath);  
  
        // IndexSearcher 是用来在索引库中进行查询的  
        // IndexSearcher indexSearcher = new  
        // IndexSearcher(FSDirectory.open(indexFile));  
        Directory directory = FSDirectory.open(indexFile);  
        IndexReader indexReader = IndexReader.open(directory);
      /*  IndexReader.open(directory);*/
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);  
        // Filter 过滤器，我们可以将查出来的结果进行过滤，可以屏蔽掉一些不想给用户看到的内容  
        Filter filter = null;  
        // 10000表示一次性在数据库中查询多少个文档  
        // topDocs 类似集合  
        TopDocs tp=indexSearcher.search(query, 100);
        
        
        //*****************************
        /*TopDocs topDocs = indexSearcher.search(phrasequery, filter, 10000);  */
        System.out.println("总共有【" + tp.totalHits + "】条匹配的结果");// 注意这里的匹配结果是指文档的个数，而不是文档中包含搜索结果的个数  
        // 3、打印结果  
        for (ScoreDoc scoreDoc : tp.scoreDocs) {  
            int docSn = scoreDoc.doc;// 文档内部编号  
            //测试Explanation
           // Explanation explanation=indexSearcher.explain(query, scoreDoc.doc);
          //  System.out.println("----explanation:----"+explanation.toString());
            
            
            Document document = indexSearcher.doc(docSn);// 根据文档编号取出相应的文档  
            File2Document.printDocumentInfo(document);// 打印出文档信息  
        }  
    }  
  
    public static void main(String[] args) throws Exception {  
        demo01 lucene = new demo01();  
   /*  lucene.createIndex();  */
        
        lucene.search("aa");  
        System.out.println("----------------11111-----");  
     /*   lucene.search("iteye");  
        System.out.println("----------------22222------");  
        lucene.search("too");  
        System.out.println("----------------33333------");  */
    }

红线标注的是需要主要的地方，fileds里面是需要查询的数据库字段

query是lucene基本查询。

本文为笔记，有点凌乱，po主模糊的地方就是这些