Upload bulk CSV data to ElasticSearch using Python

Pranay
Jun 14, 2018  ยท  11719 views

This post shows how to upload data from a csv file to ElasticSearch using Python ElasticSearch Client - Bulk helpers.

It is assumed that you already have setup ElasticSearch and have a Python environment ready along with some IDE, if not the below link might help you.

If you would like to upload a JSON file instead of a CSV file, then the below post might help you.

Elastic search

Python ElasticSearch Client

This requires to install Python Elasticsearch Client mentioned here - Python Elasticsearch Client Installation or just run the below command from your Python console.

pip install elasticsearch

Uploading bulk data from .CSV file to ElasticSearch using Python code###

Below are the steps I have performed to do the same.

  1. Read the data from .CSV file to a Panda's dataframe.
  2. Create JSON string from dataframe by iterating through all the rows and columns
  3. Convert JSON string to JSON object.
  4. Upload the JSON object using the Python ElasticSearch Client - bulk helpers

Below is the Python script

import sys
import json
from pprint import pprint
from elasticsearch import Elasticsearch
es = Elasticsearch(
    ['localhost'],
    port=9200

)

MyFile= open("C:\ElasticSearch\shakespeare_6.0.json",'r').read()
ClearData = MyFile.splitlines(True)
i=0
json_str=""
docs ={}
for line in ClearData:
    line = ''.join(line.split())
    if line != "},":
        json_str = json_str+line
    else:
        docs[i]=json_str+"}"
        json_str=""
        print(docs[i])
        es.index(index='shakespeare', doc_type='Blog', id=i, body=docs[i])
        i=i+1

Screenshot: Output of the command running in Python

We can check the uploaded data using the below Python code.

es = Elasticsearch(
    ['localhost'],
    port=9200
)
es = Elasticsearch(ES_CLUSTER)
with open("C:\ElasticSearch\shakespeare_6.0.json") as json_file:
    json_docs = json.load(json_file)
es.bulk(ES_INDEX, ES_TYPE, json_docs)

Screenshot: Output of the command running in Python

It can also be verified from Kibana Dev console (if Kibana is already installed)

Kibana GET command

Screenshot: With Kibana GET command and output in the right side

I hope this post might have helped you. Please comment and let me know your thoughts!!

AUTHOR

Pranay

A Software Engineer by profession, a part time blogger and an enthusiast programmer. You can find more about me here.


Post a comment




Sign up for our newsletter

Subscribe to receive updates on our latest posts.

Thank you! You are now subscribed.