ElasticSearch Introduction

作者：Internet 阅读次数：

ElasticSearch：Introduction

Elasticsearch (sometimes referred to as ES) was first released in 2010 and is a modern search and analytics engine based on Apache Lucene. Elasticsearch is built with Java and is a NoSQL database. This means it stores data in an unstructured way, and you cannot query it using SQL.

此 This Elasticsearch tutorial can also be considered a NoSQL tutorial. However, unlike most NoSQL databases, Elasticsearch places a strong emphasis on search functionality and features – in fact, the easiest way to retrieve data from ES is by using the extensive Elasticsearch API for searching.

In terms of data analysis, Elasticsearch works together with other components in the ELK Stack, such as Logstash and Kibana, to play the role of data indexing and storage. Nowadays, Logstash is often replaced by smaller, lighter components, such as Fluentd or FluentBit——which can accomplish most of Logstash’s functions without the heavy computational footprint and common challenges.

As you will see in this tutorial, getting started with Elasticsearch is not rocket science. Especially when you set up a small cluster, implementing an ELK logging pipeline is very simple.

However, once you start sending more data, ELK management requires more work. You will need to manage and scale larger clusters, implement more data parsing, install and manage data queues like Kafka to buffer your logs, perhaps upgrade your ELK components, and monitor and tune your stack to address performance issues.

For those who do not want to manage these tasks themselves and need additional features like RBAC, Logz.io has built a Log Management Tool， that provides ELK-as-a-service (now OpenSearch-as-a-service!), so you can embrace the world’s leading logging platform without running it yourself. Logz.io 还also supports metrics and tracing analysis – learn here how Logz.io unifies and enhances leading open-source observability technologies.

In summary, for small clusters, running Elasticsearch yourself is a good choice. Let’s see how to get started.

Installing Elasticsearch

Elasticsearch’s requirements are simple: Java 8 (specific version recommendation: Oracle JDK version 1.8.0_131). Check this Logstash tutorial to ensure you are ready. Additionally, you need to ensure that your operating system is in the Elastic Support Matrix ，otherwise you may encounter strange and unpredictable problems. After that, you can start installing Elasticsearch.

You can download Elasticsearch as a standalone distribution or install it using aptandyumrepositories. We will use apt on an Ubuntu 16.04 machine running on AWS EC2.

First, you need to add the Elastic signing key so that you can verify the downloaded packages (skip this step if you have already installed from Elastic packages):

wget -qO - https://artifacts.elastic.co/gpg-key-elasticsearch | sudo apt-key 添加 -

Copy

For Debian, we need to install the apt-transport-https:

sudo apt-get install apt-transport-https

Copy

The next step is to add the repository definition to your system:

echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list

Copy

All that remains is to update your repository and install Elasticsearch:

sudo apt-get update, sudo apt-get install elasticsearch

Copy

Configuring Elasticsearch

Elasticsearch configuration is done using a configuration file, the location of which depends on your operating system. In this file, you can configure general settings (such as node name), as well as network settings (such as host and port), where to store data, memory, log files, etc.

For development and testing purposes, the default settings are sufficient, but it is recommended that you research which settings should be manually defined before going into production.

For example, especially if you are installing Elasticsearch on the cloud, it is a good best practice to bind Elasticsearch to a private IP or localhost:

sudo vim /etc/elasticsearch/elasticsearch.yml network host:"localhost"http.port:9200

Copy

Running Elasticsearch

Elasticsearch does not run automatically after installation and needs to be started manually. How you run Elasticsearch will depend on your specific system. On most Linux and Unix-based systems, you can use this command:

sudo service elasticsearch start

Copy

That’s it! To confirm everything is working, simply point curl or your browser to http://localhost:9200, and you should see output similar to the following:

{
    “name”: “33QdmXw”,
    “cluster_name”: “elasticsearch”,
    “cluster_uuid”: “mTkBe_AlSZGbX-vDIe_vZQ”,
    “version”: {
    “number”: “6.1.2”,
    “build_hash”: “5b1fea5”,
    “build_date”: “2018-01-10T02:35:59.208Z”,
    “build_snapshot”: false,
    “lucene_version”: “7.1.0”,
    “minimum_wire_compatibility_version”: “5.6.0”,
    “minimum_index_compatibility_version”: “5.0.0”
  },
  "tagline" : "You Know, for Search"}

Copy

To debug the process of running Elasticsearch, use the Elasticsearch log file located in /var/log/elasticsearch/ (on Deb).

Creating Elasticsearch Indexes

Indexing is the process of adding data to Elasticsearch. This is because when you input data into Elasticsearch, the data is placed into an Apache Lucene index. This makes sense because Elasticsearch uses Lucene indexes to store and retrieve its data. Although you do not need to know much about Lucene, understanding how it works will indeed be helpful when you start using Elasticsearch seriously.
Elasticsearch behaves like a REST API, so you can use the PUT or POST methods to add data to it. You can use PUT when you know or want to specify the id of the data item, or if you want Elasticsearch to generate an id for the data item. POST:

curl -XPOST 'localhost:9200/logs/my_app' -H 'Content-Type: application/json' -d'{
	"timestamp": "2018-01-24 12:34:56",
	"message": "user login",
	“user ID”：4，
	“admin”：false}'curl -X PUT 'localhost:9200/app/users/4' -H 'Content-Type: application/json' -d '{
  “number”：4，
  “username”：“John”，
  “last_login”：“2018-01-25 12:34:56”}'

Copy

And the response：

{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_version":1,"result":"created","_shards":{"total":2, “successful”：1，“failed”：0}，“_seq_no”：0，“_primary_term”：1}{"_index":"app","_type":"users","_id":"4","_version":1,"result":"created","_shards":{"total":2, “successful”：1，“failed”：0}，“_seq_no”：0，“_primary_term”：1}

Copy

The data of the document is sent as a JSON object. You might wonder how we can index data without defining the data structure in advance. Well, like any other NoSQL database, using Elasticsearch does not require you to define the structure of the data in advance. However, to ensure optimal performance, you can define Elasticsearch mappings based on the data type. We will go into more detail later.

If you use any Beats shippers (such as Filebeat or Metricbeat) or Logstash, these parts of the ELK Stack will automatically create indexes.

To view the list of Elasticsearch indexes, use:

curl -XGET 'localhost:9200/_cat/indices?v&pretty' health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yello open logstash-2018.01.23 y_-PguqyQ02qOqKiO6mkfA 5 1 17279 0 9.9mb 9.9mb yello open app GhzBirb-TKSUFLCZTCy-xg 5 1 1 0 5.2kb 5.2kb yello open .kibana Vne6TTWgTVeAHCSgSboa7Q 1 1 2 0 8.8kb 8.8kb yello open logs T9E6EdbMSxa8S_B7SDabTA 5 1 1 0 5.7kb 5.7kb

Copy

The list in this example includes the indexes we created above, a Kibana index, and an index created by a Logstash pipeline.

Getting Started with Elasticsearch Queries

Once your data is indexed in Elasticsearch, you can begin to search and analyze it. The most basic query you can perform is to retrieve a single item. Check out our article that focuses on Elasticsearch queries for more information.

Using the Elasticsearch REST API, we use the GET method to retrieve data. Here’s an example using curl:

curl -XGET 'localhost:9200/app/users/4?pretty'

Copy

And the Response:

{
  “_index”：“app”，
  “_type”：“users”，
  “_id”：“4”，
  “_version”：1，
  “found”：true，
  “_source” ： {
    “number”：4，
    “username”：“John”，
    “last_login”：“2018-01-25 12:34:56”
  }}

Copy

The fields that start with an underscore are meta-fields associated with the search results. The source object is the original document that was indexed.
We can also use the GET method to search by calling the search endpoint. Here’s an example using curl:

curl -XGET 'localhost:9200/_search?q=logged'{"took":173,"timed_out":false,"_shards":{"total":16,"successful":16,"skipped":0,"failed":0},"hits":{"total ":1,"max_score":0.2876821,"hits":[{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_score":0.2876821,"_source":{
    "timestamp": "2018-01-24 12:34:56",
    "message": "User login",
    “user_id”：4，
    “admin”：false}}]}}

Copy

The result contains many additional fields that describe the search and the results. Here’s a brief overview:

took：the time the search took (in milliseconds)
timed_out: If the search timed out
_shards：the number of Lucene shards the search was executed on, along with the success and failure rates
hits：the actual results, along with meta-information about the results

The search we performed above is called a URI search, and is the simplest way to query Elasticsearch. By providing just one word, ES will search for that word in all fields of all documents. You can use Lucene serach to build more specific searches:

username:johnb – finds documents where the username field is equal to “johnb”
john* – finds documents that contain terms that start with “john” followed by zero or more characters (e.g., “john”, “johnb”, “johnson”)
约翰？– finds documents that contain terms that start with “john” and are followed by exactly one character. Matches “johnb” and “johns” but does not match “john”.

There are many other search methods, including the use of boolean logic, term boosting, fuzzy and proximity searches, and the use of regular expressions.

Elasticsearch Query DSL

URI search is just the beginning. Elasticsearch also provides a request body search with the Query DSL for more advanced searching. There are a large number of options available in these types of searches, and you can mix and match different options to get the results you need.

It includes two types of clauses: 1) leaf query clauses that find values in specific fields, and 2) compound query clauses (which may contain one or more leaf query clauses).

Elasticsearch Query Types

There are a large number of options available in these types of searches, and you can mix and match different options to get the results you need. Query types include:

Geo queries
“More like this” queries
Script queries
Full-text queries
Shape queries
Span queries
Term-level queries
Specialized queries

For more information on this topic:

Starting with Elasticsearch 6.8, the ELK Stack has merged Elasticsearch query and Elasticsearch filters, but ES still distinguishes them by context. The DSL distinguishes between filter context and query context for clauses. Clauses in the filter context test documents booleanly: does the document match the filter, “yes” or “no”? Filters are also usually faster than queries, but queries can also calculate a relevance score based on how well the document matches the query. Filters do not use a relevance score. This determines the sorting and inclusion of documents:

curl -XGET 'localhost:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d'{
  “querty”： {
    “match_phrase”：{
      "message": "User login"
    }
  }}'

Copy

Result：

{
  “took”：28，
  “timed_out”：false，
  “_shards”：{
    “total”：5，
    “successful”：5，
    “skipped”：0，
    “failed”：0
  },
  “hits”：{
    “total”：1，
    “max_score”：0.8630463，
    “hits”：[
      {
        “_index”：“logs”，
        “_type”：“my_app”，
        “_id”：“ZsWdJ2EBir6MIbMWSMyF”，
        “_score”：0.8630463，
        “_source” ： {
          “timestamp”：“2018-01-24 12:34:56”，
          "message" : "User login",
          “user_id”：4，
          “admin”：false
        }
      }
    ]
  }}'

Copy

Creating an Elasticsearch Cluster

Maintaining an Elasticsearch cluster can be time-consuming, especially when you are doing DIY ELK. However, considering the powerful search and analytics capabilities of Elasticsearch, such a cluster is essential. We have delved deeper into this topic through our Elasticsearch cluster tutorial, so we will use this as a springboard for a more thorough exercise.

What exactly is an Elasticsearch cluster? An Elasticsearch cluster combines multiple Elasticsearch nodes and/or instances. Of course, you always have the option to maintain a single Elasticsearch instance or node within a given cluster. The point of this grouping is the distribution of tasks, search, and indexing across its nodes. Node options include data nodes, master nodes, client nodes, and ingest nodes.

Installing nodes may involve a lot of configuration, which is covered in the tutorials mentioned earlier. But here is the basic installation of an Elasticsearch cluster node:

First, install Java:

sudo apt-get install default-jre

Copy

Next, add the Elasticsearch signing key:

wget -qO - https://artifacts.elastic.co/gpg-key-elasticsearch | sudo apt-key add -

Copy

Next, install the latest version of Elasticsearch:

sudo apt-get update && apt-get install elasticsearch

Copy

You must create and/or set your own elasticsearch.yml configuration file for each Elasticsearch node (sudo vim /etc/elasticsearch/elasticsearch.yml). From there, start Elasticsearch and check your Elasticsearch cluster status.The response will look like this:

{ “cluster_name”: “elasticsearch-cluster-demo”, “compressed_size_in_bytes”: 255, “version”: 7, “state_uuid”: “50m3ranD0m54a531D”, “master_node”: “IwEK2o1-Ss6mtx50MripkA”, “blocks”: {}, “nodes”: { “m4-aw350m3-n0D3”: { “name”: “es-node-1”, “ephemeral_id”: “x50m33F3mr–A11DnuM83r”, “transport_address”: “172.31.50.123:9200”, “attributes”: {} } } }

Copy

Elasticsearch cluster health will be the next on your list. Use the following API call to periodically check the health of the cluster:

curl -X GET "localhost:9200/_cluster/health?wait_for_status=yellow&local=false&level=shards&pretty"

Copy

This example shows the parameter local as false (which is actually the default setting). This will show you the status of the master node. To check the local node, change it to true.

By default, the level parameter will show you the health of the cluster, but ranking beyond that includes indices and shards (as shown in the example above).

There are also other optional parameters for timeout…

timeout master_timeout

…or wait for certain events to occur:

wait_for_active_shards wait_for_events wait_for_no_initializing_shards wait_for_no_relocating_shards wait_for_nodes wait_for_status

Of course, with Logz.io, creating an Elasticsearch (now OpenSearch) cluster is as simple as starting a free trial. Expanding your Elasticsearch cluster does not require any help from the user.

Deleting Elasticsearch Data

Deleting an item from Elasticsearch is as simple as inputting data into Elasticsearch. The HTTP method used this time is, surprisingly, DELETE:

$ curl -XDELETE ‘localhost:9200/app/users/4?pretty’{
    “_index”: “app”,
    “_type”: “users”,
    “_id”: “4”,
    “_version”: 2,
    “result”: “deleted”,
    “_shards”: {
    “total”: 2,
    “successful”: 1,
    “failed”: 0
  },
  “_seq_no”：1，
  “_primary_term”：1}

Copy

To remove an index, execute:

$ curl -XDELETE 'localhost:9200/logs?pretty'

Copy

To wipe out all indices (exercise caution when doing so), execute:

$ curl -XDELETE 'localhost:9200/_all?pretty'$

Copy

Regardless of the method, the expected response will be:

{ "Confirmation of deletion": true}

Copy

To eliminate a single document:

$ curl -XDELETE 'localhost:9200/index/type/document'

返回列表