Featured image of post Stop One-By-One Inserts! Bulk Insert Data In Rails With insert_all Or activerecord-import

Stop One-By-One Inserts! Bulk Insert Data In Rails With insert_all Or activerecord-import

1. Why Bulk Inserts Matter?

When dealing with large datasets for example, you want to import a CSV containing more than a thousand of tasks from your department, inserting records one by one (.create or .save) can drastically slow down performance and cause database bottlenecks

Imagine inserting 100,000 records—doing it traditionally means 100,000 separate queries! Instead, bulk inserting reduces this to a single efficient query, saving time and resources.

2. Prerequisites

Before diving in, ensure:

  • You’re using Rails 6+ for insert_all
  • You have the activerecord-import gem installed if using an older version which supports Rails 3.x

3. Step-by-Step Guide

3.1 The Problem With Each-By-Each Inserts

For example, you want to create 100k records of tasks by a naive way, this is how it look like:

1
2
3
4
  # Create 100k records each times
  100_000.times do |i|
    Task.create!({ name: "Task #{i}", description: "This is description of task #{i}", is_finished: false })
  end

I did some benchmark here with this snippet:

1
2
3
4
5
6
7
  execution_time = Benchmark.measure do
    100_000.times do |i|
      Task.create!({ name: "Task #{i}", description: "This is description of task #{i}", is_finished: false })
    end
  end

  puts "Execution time: #{execution_time.real} seconds"

Benchmark One-by-One

Image shows that the execution of record creation takes 20.99 seconds

✅ Pros:

  1. Models validation or callbacks works - the name field is not null but there is a call with null name, the insert will be raise with errors
  2. Older Rails versions < 6.0 support - Legacy project without bulk insert support.

❌ Cons:

  1. Performance Issue - Each insert requires a separate database transaction, increasing overhead.
  2. Increased Network Traffic - 100k request will be made into the database which increases network costs.

3.2 Solution 1: Use insert_all

The insert_all method allows you to insert multiple records at once, significantly boosting performance.

Example: Using insert_all for create 100k tasks

1
2
3
4
5
6
7
8
9
  tasks = []

  # Initialize 100k tasks hash then add to tasks array
  100_000.times do |i|
    tasks << { name: "Task #{i}", description: "This is description of task #{i}", is_finished: false }
  end

  # Bulk insert all the initialized tasks
  Task.insert_all(tasks)

Insert All

Image shows that the execution of record creation takes 1.67 seconds

✅ Pros:

  1. Faster than .create - Single SQL query
  2. Rails built-in method - Bulk operations directly in Rails without extra gems.

❌ Cons:

  1. Does not accept ActiveRecord models – Only works with raw hashes.
  2. Bypasses model validations and callbacks – Data integrity must be handled manually.
  3. Cannot handle associations automatically – Requires extra queries to fetch related IDs.

Example: Inserting Tasks that Belong to Department

1
2
3
4
5
  departments = [{ name: "Marketing" }] 
  department = Department.insert_all(departments) 
  department_id = department.id # Error raised

  tasks = [{ name: "Task 0", description: "This is the task description", is_finished: false, department_id: ??? }] # missing department_id here

No ID returns error exception

Image shows that no id can be returned after using insert_all

👉 Fix: Manually retrieve department_id before inserting tasks, adding an extra query.

1
2
  department = Department.find_by(name: "Marketing") # cost a SQL query
  tasks = [{ name: "Task 0", description: "This is the task description", is_finished: false, department_id: department.id }]

3.3 Solution 2: Best of both worlds - Use activerecord-import gem

The import method allows you to insert models with associations

Example: Using import to create 100k tasks in 1 department

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  department = Department.new(name: "Marketing")
  tasks = []

  100_000.times do |i|
    tasks << Task.new({ name: "Task #{i}", description: "This is description of task #{i}", is_finished: false })
  end
  department.tasks = tasks

  execution_time = Benchmark.measure do
    Department.import [department]
  end

  puts "Execution time: #{execution_time.real} seconds"
  • Able to import 1 department with 100k tasks with the use of Activerecord-import Associations Import

  • Observe the performance compare to insert_all is similar to each other ( About 1.44 seconds ) Benchmark ActiveRecord-Import

✅ Pros:

  • Works with raw columns and arrays of values (fastest)
  • Works with model objects (faster)
  • Performs validations (fast)
  • Performs on duplicate key updates (requires MySQL, SQLite 3.24.0+, or Postgres 9.5+)

❌ Cons:

  • Need an extra gem installed
  • ActiveRecord callbacks related to creatingupdating, or destroying records (other than before_validation and after_validation) will NOT be called when calling the import method. ( calling separately with run_callbacks as recommended )

4. Conclusion

One rule to remember when inserting a large number of records ( Example: Import a large number of records from a CSV file ) is to avoid creating records one-by-one. Instead, we can consider using activerecord-import or insert_all for a great performance.

5. Reference