request-quote
Ruby on Rails

CSV Import. Just SIT (Split, Insert, Think) Down and Do It

By Alexey S. January 05th, 2016

Sometimes it happens that we have a really big .csv (~100 MB) file and we need to import it to our RoR application database. We can try to use the built-in class:

require 'csv'

def import(file)
 CSV.foreach(file.path) do |row|
   product = Product.from_csv_row(row)
   product.save!
 end
end

 

However, it will take too much time for .csv files to be imported. What is more, you will have to import the .csv file again in case some rows are broken. 

 

So how to solve this problem?

If your models are not linked you can just use activerecord-import as mass insertion is much faster in this case.

require 'csv'

def import(file)
  products = []
  CSV.foreach(file.path) do |row|
    products << Product.from_csv_row(row)
  end
  Product.import products
end

 

If the models are linked let’s try some tricks.

At first, I would advise  to split the full import process into chunks. It gives 4 advantages:

1. You don’t have to wait forever untill .csv is opened.

2. You don’t have to start from the begining if something goes wrong.

3. You can monitor process status.

4. It is easier to debug.

 

Secondly, it’s better to keep the last inserted id from .csv (or just row number) in your database.

class AddCsvRowId < ActiveRecord::Migration
 def change
   change_table :products do | t |
     t.integer :csv_row_id
   end
 end
end
 

I have used SmarterCSV gem. It allows you to read .csv files by chunks. Therefore we get

def import(file)
 total_chunks = SmarterCSV.process(
     file.path,
     {chunk_size: 10000}
) do |chunk_products|
   if Product.last.present?
     chunk_products = chunk_products.select {|product|
       product[:id].to_i > TimeBlock.last.csv_row_id
     }
   end
   chunk_products.each do |row|
     next if Product.exists?(csv_row_id: row[:id].to_i)
     product = Product.from_csv_row(row)
     next unless product.valid?
     product.images << Image.from_csv_row(row)
     product.tags << Tag.from_csv_row(row)
     product.save!
   end
 end
 puts total_chunks.to_s
end

 

And finally, it is better to move import to background process.

task import_products: :environment do
 ImportProductsJob.perform_later('/path/to/file.csv')
end

class ImportProductsJob < ActiveJob::Base
 rescue_from(StandardError) do
   retry_job wait: 5.minutes, queue: :low_priority
 end

 def perform(*args)
   import(args[0]) if File.exist?(args[0]) &&
                      File.extname(args[0]) == 'csv'
 end

 def import(file)
  # process file
 end

end

 

Conclusion

So .csv file import is not that diffucult if you have some basic tricks up your sleeve and use these links:

1. ActiveRecordImport

2. SmarterCsv

3. ActiveJob

 
 

 

Alexey S.

Alexey S.

Ruby on Rails Developer at iKantam

Browse Recent Posts