CSV refers to “comma-separated values”. A CSV file is a simple text file in which data record is separated by commas or other types of values as the delimiter. It is commonly used in spreadsheet apps. Each line of the file in CSV is basically a data record. Each record consists of one or more fields.
Sometimes Redis instances need to be loaded with a big amount of preexisting or user generated data in a short amount of time, so that millions of keys will be created as fast as possible. This is called a “mass insertion”.
For example, the below example shows how one can use Linux utility “awk” to import data into Redis.
cat data.csv | awk -F',' '{print " SET \""$1"\" \""$2"\" \n"}' | redis-cli --pipe
Using redis-cli and available Linux tools like awk to perform mass insertion might not be a good idea. Imagine you have a ton of graph data and you want to insert it into Redis database. By using a Linux utility like “awk”, you are basically sending one command after the other and you might end up paying for the round trip time for every single command, hence resulting in a slow performance speed.
Introducing RedisGraph Bulk Loader
If you have a bunch of CSV files that you want to load to RedisGraph database, you must try out this Bulk Loader utility. Rightly called RedisGraph Bulk Loader, this tool is written in Python and helps you in building RedisGraph databases from CSV inputs. This utility requires a Python 3 interpreter.
Follow the steps below to load CSV data into Graph database hosted over Redis Enterprise Cloud
Step 1. Run Redis Docker container
The Redismod Docker image is a simple container image that bundles together the latest stable releases of Redis and select Redis modules. This image is based on the official image of Redis from Docker. By default, the container starts with Redis’ default configuration and all included modules loaded.
Use the following command to run a Docker container with RedisGraph module inbuilt into it:
$ docker run -d -p 6379:6379 redislabs/redismod
Step 2. Connect to the Redis database
You can either use redis-cli or use RedisInsight to connect to Redis database. Let’s try using redis-cli as shown below:
redis-cli
Verify if all the Redis modules are getting loaded.
$ redis-cli
127.0.0.1:6379> info modules
# Modules
module:name=rg,ver=10006,api=1,filters=0,usedby=[],using=[ai],options=[]
module:name=ai,ver=10002,api=1,filters=0,usedby=[rg],using=[],options=[]
module:name=timeseries,ver=10408,api=1,filters=0,usedby=[],using=[],options=[]
module:name=bf,ver=20205,api=1,filters=0,usedby=[],using=[],options=[]
module:name=graph,ver=20402,api=1,filters=0,usedby=[],using=[],options=[]
module:name=ReJSON,ver=10007,api=1,filters=0,usedby=[],using=[],options=[]
module:name=search,ver=20006,api=1,filters=0,usedby=[],using=[],options=[]
Step 3. Clone the Bulk Loader Utility
$ git clone https://github.com/RedisGraph/redisgraph-bulk-loader
Step 4. Installing the RedisGraph Bulk Loader tool
The bulk loader can be installed using pip:
pip3 install redisgraph-bulk-loader
Or
pip3 install git+https://github.com/RedisGraph/redisgraph-bulk-loader.git@master
Step 5. Create a Python virtual env for this work
python3 -m venv redisgraphloader
Step 6. Step into the venv:
source redisgraphloader/bin/activate
Step 7. Install the dependencies for the bulk loader:
pip3 install -r requirements.txt
If the above command doesn’t work, install the below modules:
pip3 install pathos
pip3 install redis
pip3 install click
Step 8. Install groovy
Ensure that the app points to Redis Enterprise Cloud Endpoint URL and password.
groovy generateCommerceGraphCSVForImport.groovy
Step 9. Verify the .csv files created
head -n2 *.csv
==> addtocart.csv <==
src_person,dst_product,timestamp
0,1156,2010-07-20T16:11:20.551748
==> contain.csv <==
src_person,dst_order
2000,1215
==> order.csv <==
_internalid,id,subTotal,tax,shipping,total
2000,0,904.71,86.40,81.90,1073.01
==> person.csv <==
_internalid,id,name,address,age,memberSince
0,0,Cherlyn Corkery,146 Kuphal Isle South Jarvis MS 74838-0662,16,2010-03-18T16:25:20.551748
==> product.csv <==
_internalid,id,name,manufacturer,msrp
1000,0,Sleek Plastic Car,Thiel Hills and Leannon,385.62
==> transact.csv <==
src_person,dst_order
2,2000
==> view.csv <==
src_person,dst_product,timestamp
0,1152,2012-04-14T11:23:20.551748
Step 10. Run the Bulk loader script
python3 bulk_insert.py prodrec-bulk -n person.csv -n product.csv -n order.csv -r view.csv -r addtocart.csv -r transact.csv -r contain.csv
person [####################################] 100%
1000 nodes created with label 'person'
product [####################################] 100%
1000 nodes created with label 'product'
order [####################################] 100%
811 nodes created with label 'order'
view [####################################] 100%
24370 relations created for type 'view'
addtocart [####################################] 100%
6458 relations created for type 'addtocart'
transact [####################################] 100%
811 relations created for type 'transact'
contain [####################################] 100%
1047 relations created for type 'contain'
Construction of graph 'prodrec-bulk' complete: 2811 nodes created, 32686 relations created in 1.021761 seconds
graph.query prodrec "match (p:person) where p.id=200 return p.name"
1) 1) "p.name"
2) (empty array)
3) 1) "Cached execution: 0"
2) "Query internal execution time: 0.518300 milliseconds"
Step 11 . Install RedisInsight
To use RedisInsight on a local Mac, you can download from the RedisInsight page on the RedisLabs website:
Click this link to access a form that allows you to select the operating system of your choice.
Alternatively, if you have Docker Engine installed in your system, the quick way is to run the following command:
docker run -d -v redisinsight:/db -p 8001:8001 redislabs/redisinsight:latest
Step 11. Accessing RedisInsight
Next, point your browser to http://localhost:8001. Once you are able to access the dashboard, supply your database endpoint, port and password to connect to the remote Redis Enterprise Cloud database.
Step 12. Run the Graph Query
GRAPH.QUERY "prodrec-bulk" "match (p:person) where p.id=199 return p"