简体   繁体   English

通过数据导入处理程序将 solr 与 aws RDS Mysql 连接

[英]Connecting solr with aws RDS Mysql through data import handler

I recently started implementing solr-cloud on AWS EC2 for search applications.我最近开始在 AWS EC2 上为搜索应用程序实施 solr-cloud。 I have created 2 AWS Ec2 instances with the following configurations ---我创建了 2 个具有以下配置的 AWS Ec2 实例 ---

  1. EC2 Type - t2.medium EC2 类型 - t2.medium
  2. ram - 4GB内存 - 4GB
  3. Disk Space - 8GB磁盘空间 - 8GB
  4. OS - ubuntu 18.04操作系统-ubuntu 18.04

For the 2 EC2 instances, I have created a security group which allows all inbound traffic.对于 2 个 EC2 实例,我创建了一个允许所有入站流量的安全组。 NACL has default settings that allows all inbound traffic as well. NACL 具有允许所有入站流量的默认设置。

Steps Followed to install Apache Solr -安装 Apache Solr 的步骤 -

  1. ssh into ec2: ssh 进入 ec2:
ssh -i "pem_file" ubuntu@ec2-public-ipv4-address
  1. cd to /opt directory cd 到 /opt 目录
  2. run --> sudo apt-update运行 --> sudo apt-update
  3. run --> sudo apt-get openjdk-11运行 --> sudo apt-get openjdk-11
  4. Check java -version检查 java -版本
  5. run --> wget https://archive.apache.org/dist/lucene/solr/8.3.0/solr-8.3.0.tgz运行 --> wget https://archive.apache.org/dist/lucene/solr/8.3.0/solr-8.3.0.tgz
  6. run --> tar -xvzf solr-8.3.0.tgz运行 --> tar -xvzf solr-8.3.0.tgz
  7. export SOLR_HOME=/opt/solr-8.3.0
  8. Add /opt/solr-8.3.0 to Path environment variable将 /opt/solr-8.3.0 添加到 Path 环境变量
  9. Update the sudo vim /etc/hosts file with the hosts -- a.使用主机更新 sudo vim /etc/hosts 文件—— public-ip-v4-address-of-ec2 solr-node-1公共 ip-v4-address-of-ec2 solr-node-1
  10. Started Solr using the following command --> sudo bin/solr start -c -p 8983 -h solr-node-1 -force使用以下命令启动 Solr --> sudo bin/solr start -c -p 8983 -h solr-node-1 -force
  11. Checked the opened ports using --> sudo lsof -i -P -n | grep LISTEN使用 --> sudo lsof -i -P -n | grep LISTEN检查打开的端口sudo lsof -i -P -n | grep LISTEN
  12. Created collections, shards and replicas using ---> bin/solr create -c travasko -d sample_techproducts_configs -n travasko_configs -shards 2 -rf 2 -p 8983使用 ---> bin/solr create -c travasko -d sample_techproducts_configs -n travasko_configs -shards 2 -rf 2 -p 8983创建 collections、分片和副本

I repeated the same process on the other EC2 machine and ran solr on it.我在另一台 EC2 机器上重复了相同的过程,并在其上运行了 solr。 Now, to use the data import handler in solr, I edited the following files:现在,为了使用 solr 中的数据导入处理程序,我编辑了以下文件:

  1. solrconfig.xml solrconfig.xml
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">data-config.xml</str>
    </lst>
</requestHandler>
  1. data-config.xml数据配置.xml
<dataConfig>
<dataSource type="JdbcDataSource" 
            driver="com.mysql.jdbc.Driver"
            url="jdbc:mysql://examplerds.cuhj86yfdpid.us-east-1.rds.amazonaws.com:3306/TRAVASKODB1" 
            user="examplerds" 
            password="examplerds#123"/>
<document>
  <entity name="MOMENTS"  
    pk="MOMENT_ID"
    query="SELECT MOMENT_ID,MOMENT_TEXT FROM MOMENTS"
    deltaImportQuery="SELECT MOMENT_ID,MOMENT_TEXT FROM MOMENTS WHERE MOMENT_ID='${dih.delta.MOMENT_ID}'"
    deltaQuery="SELECT MOMENT_ID FROM MOMENTS  WHERE LAST_MODIFIED > '${dih.last_index_time}'"
    >
     <field column="MOMENT_ID" name="MOMENT_ID"/>
     <field column="MOMENT_TEXT" name="MOMENT_TEXT"/>       
  </entity>
</document>
</dataConfig>
  1. managed_schema托管模式
<schema name="MOMENTS" version="1.5">
    <field name="_version_" type="long" indexed="true" stored="true"/>
    <field name="MOMENT_ID" type="integer" indexed="true" stored="true" required="true" multiValued="false" /> 
    <field name="MOMENT_TEXT" type="string" indexed="true" stored="true" multiValued="false" />
</schema>
  1. Downloaded mysql jdbc using the following command:使用以下命令下载 mysql jdbc:
wget -q "http://search.maven.org/remotecontent?filepath=mysql/mysql-connector-java/5.1.32/mysql-connector-java-5.1.32.jar" -O mysql-connector-java.jar
  1. Add to solrconfig.xml:添加到 solrconfig.xml:
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="mysql-connector-java.jar" />

  1. After editing the files above, I uploaded them to the solr-cloud using the following zookeper command -->编辑完上述文件后,我使用以下 zookeper 命令将它们上传到 solr-cloud -->
bin/solr zk -n travasko_config -z solr-node-1:9983 cp /opt/solr-8.3.0/server/solr/configsets/_default/conf/managed-schema zk:/configs/travasko_config/managed-schema
  1. I then checked all the above files in the solr-cloud and could notice the changes i added.然后我检查了 solr-cloud 中的所有上述文件,并且可以注意到我添加的更改。
  2. The current issue is that when I select the collection I created above, and click on Dataimport, It throws an error as below --->当前的问题是,当我 select 上面创建的集合,然后单击 Dataimport 时,它会抛出如下错误 --->
The solrconfig.xml file for this index does not have an operational DataImportHandler defined!

Note: The AWS RDS and EC2 instances are in the same VPC sharing the same Security Group.注意: AWS RDS 和 EC2 实例位于同一个 VPC 中,共享同一个安全组。

So why is solrconfig.xml file throwing an error during dataimport?那么为什么 solrconfig.xml 文件在数据导入期间会抛出错误? What am i missing here?我在这里想念什么?

The solution to the above issue was basically setting the java system property for solr versions greater than 8.2.0 as below:上述问题的解决方案基本上是为大于 8.2.0 的 solr 版本设置 java 系统属性,如下所示:

-Denable.dih.dataConfigParam=true

This parameter can be set either in solr.in.cmd or solr.in.sh which can be found inside the directory below: ,此参数可以在 solr.in.cmd 或 solr.in.sh 中设置,可以在以下目录中找到:

/opt/solr-8.3.0/bin 

If, /opt/solr-8.3.0 is the installation directory of solr.如果,/opt/solr-8.3.0是solr的安装目录。

The other method was to pass this parameter as command line parameter while starting solr as below:另一种方法是在启动 solr 时将此参数作为命令行参数传递,如下所示:

sudo bin/solr start -c -p 8983 -h solr-node-1 -Denable.dih.dataConfigParam=true -force

solr-node-1 is the public IPv4 address of the AWS Ec2 instance on which solr is configured. solr-node-1 是配置了 solr 的 AWS Ec2 实例的公共 IPv4 地址。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM