How to Import CSV data into Redis using Docker

CSV refers to “comma-separated values”. A CSV file is a simple text file in which data record is separated by commas or other types of values as the delimiter. It is commonly used in spreadsheet apps. Each line of the file in CSV is basically a data record. Each record consists of one or more fields.

Sometimes Redis instances need to be loaded with a big amount of preexisting or user generated data in a short amount of time, so that millions of keys will be created as fast as possible. This is called a “mass insertion”. 

https://stackoverflow.com/questions/32149626/how-to-insert-billion-of-data-to-redis-efficiently/32165090

For example, the below example shows how one can use Linux utility “awk” to import data into Redis.

cat data.csv | awk -F',' '{print " SET \""$1"\" \""$2"\" \n"}' | redis-cli --pipe 

Using redis-cli and available Linux tools like awk to perform mass insertion might not be a good idea. Imagine you have a ton of graph data and you want to insert it into Redis database. By using a Linux utility like “awk”, you are basically sending one command after the other and you might end up paying for the round trip time for every single command, hence resulting in a slow performance speed. 

Introducing RedisGraph Bulk Loader

If you have a bunch of CSV files that you want to load to RedisGraph database, you must try out this Bulk Loader utility. Rightly called RedisGraph Bulk Loader, this tool is written in Python and helps you in building RedisGraph databases from CSV inputs. This utility requires a Python 3 interpreter.

Follow the steps below to load CSV data into Graph database hosted over Redis Enterprise Cloud

Step 1. Run Redis Docker container

The Redismod Docker image is a  simple container image that bundles together the latest stable releases of Redis and select Redis modules. This image is based on the official image of Redis from Docker. By default, the container starts with Redis’ default configuration and all included modules loaded.

Use the following command to run a Docker container with RedisGraph module inbuilt into it:

  $ docker run -d -p 6379:6379 redislabs/redismod

Step 2. Connect to the Redis database

You can either use redis-cli or use RedisInsight to connect to Redis database. Let’s try using redis-cli as shown below:

 redis-cli

Verify if all the Redis modules are getting loaded.

 $ redis-cli
127.0.0.1:6379> info modules
# Modules
module:name=rg,ver=10006,api=1,filters=0,usedby=[],using=[ai],options=[]
module:name=ai,ver=10002,api=1,filters=0,usedby=[rg],using=[],options=[]
module:name=timeseries,ver=10408,api=1,filters=0,usedby=[],using=[],options=[]
module:name=bf,ver=20205,api=1,filters=0,usedby=[],using=[],options=[]
module:name=graph,ver=20402,api=1,filters=0,usedby=[],using=[],options=[]
module:name=ReJSON,ver=10007,api=1,filters=0,usedby=[],using=[],options=[]
module:name=search,ver=20006,api=1,filters=0,usedby=[],using=[],options=[]

Step 3. Clone the Bulk Loader Utility

  $ git clone https://github.com/RedisGraph/redisgraph-bulk-loader

Step 4. Installing the RedisGraph Bulk Loader tool

The bulk loader can be installed using pip:

   pip3 install redisgraph-bulk-loader

Or

  pip3 install git+https://github.com/RedisGraph/redisgraph-bulk-loader.git@master

Step 5. Create a Python virtual env for this work

  python3 -m venv redisgraphloader

Step 6. Step into the venv:

  source redisgraphloader/bin/activate

Step 7. Install the dependencies for the bulk loader:

  pip3 install -r requirements.txt

If the above command doesn’t work, install the below modules:

  pip3 install pathos
  pip3 install redis
  pip3 install click

Step 8. Install groovy

Ensure that the app points to Redis Enterprise Cloud Endpoint URL and password.

  groovy generateCommerceGraphCSVForImport.groovy

Step 9. Verify the .csv files created

 head -n2 *.csv
 ==> addtocart.csv <==
 src_person,dst_product,timestamp
 0,1156,2010-07-20T16:11:20.551748

 ==> contain.csv <==
 src_person,dst_order
 2000,1215

 ==> order.csv <==
 _internalid,id,subTotal,tax,shipping,total
 2000,0,904.71,86.40,81.90,1073.01

 ==> person.csv <==
 _internalid,id,name,address,age,memberSince
  0,0,Cherlyn Corkery,146 Kuphal Isle South Jarvis MS 74838-0662,16,2010-03-18T16:25:20.551748

 ==> product.csv <==
 _internalid,id,name,manufacturer,msrp
 1000,0,Sleek Plastic Car,Thiel Hills and Leannon,385.62

 ==> transact.csv <==
 src_person,dst_order
 2,2000

 ==> view.csv <==
 src_person,dst_product,timestamp
 0,1152,2012-04-14T11:23:20.551748

Step 10. Run the Bulk loader script

   python3 bulk_insert.py prodrec-bulk -n person.csv -n product.csv -n order.csv -r view.csv -r addtocart.csv -r transact.csv -r contain.csv
  person  [####################################]  100%
  1000 nodes created with label 'person'
  product  [####################################]  100%
  1000 nodes created with label 'product'
  order  [####################################]  100%
  811 nodes created with label 'order'
  view  [####################################]  100%
  24370 relations created for type 'view'
  addtocart  [####################################]  100%
  6458 relations created for type 'addtocart'
  transact  [####################################]  100%
  811 relations created for type 'transact'
  contain  [####################################]  100%
  1047 relations created for type 'contain'
  Construction of graph 'prodrec-bulk' complete: 2811 nodes created, 32686 relations created in 1.021761 seconds
  graph.query prodrec "match (p:person) where p.id=200 return p.name"
  1) 1) "p.name"
  2) (empty array)
  3) 1) "Cached execution: 0"
     2) "Query internal execution time: 0.518300 milliseconds"

Step 11 . Install RedisInsight

To use RedisInsight on a local Mac, you can download from the RedisInsight page on the RedisLabs website:

Click this link to access a form that allows you to select the operating system of your choice.

Image4

Alternatively, if you have Docker Engine installed in your system, the quick way is to run the following command:

  docker run -d -v redisinsight:/db -p 8001:8001 redislabs/redisinsight:latest

Step 11. Accessing RedisInsight

Next, point your browser to http://localhost:8001. Once you are able to access the dashboard, supply your database endpoint, port and password to connect to the remote Redis Enterprise Cloud database.

Step 12. Run the Graph Query

  GRAPH.QUERY "prodrec-bulk" "match (p:person) where p.id=199 return p"
Image111

Further Read

Please follow and like us:
0