sqlalchemy is widely used. For example, Pandas read_sql uses sqlalchemy to connect to a supported database.
In this test, I want to transfer all records between two identitical tables. It is useful when you need to move data between 2 non-production environments. This is not the most efficient way to move data, but I will show here that it’s very a convenient way to move small data sets under 500K records. Because the code is so simple, it will take you only 15 to 30 minutes to write and test.
Here is how it works:
- sqlalchemy autoload can retrieve table ddl
- sqlalchemy can generate our select query and insert queries
- use fetchmany/select and executemany/insert to copy rows from one to another
complete source code on my github repo.
try:
source_tbl = Table(from_table, smeta, autoload=True, schema=from_owner,autoload_with=sengine)
dest_tbl = Table(to_table, dmeta, autoload=True, schema=to_owner,autoload_with=dengine)
#
sel_query = select([source_tbl])
insert_sql = dest_tbl.insert()
#
results = source.execute(sel_query)
while True:
recs = results.fetchmany(ARRAY_SIZE)
_total = len(recs)
total += _total
if _total > 0:
dest.execute(insert_sql, recs)
dest.commit()
if _total < ARRAY_SIZE:
break
# done
print(f'{total} records copied from {from_table} to {to_table}')
except sqlalchemy.exc.NoSuchTableError as e1:
pass
With a short script, you can copy all the tables you need.
This is not a good solution to deal with large data set
Because inserting is really slow even you do all you find the optimal batch size. This is just how an RDBMS works. You have to use a bulk load tools like SQL*Loader for Oracle, MultiLoad for Teradata, COPY for PostGresql, … .
Or if you have DBA privileges, you can:
- Load data to a staging table that does not have any constraints or indexes
- Validate data manually with for example Oracle PL/SQL on an Oracle db
- Disable indexes and constraints on the targeted production table
- Copy data from staging to production table
- Re-enabl indexes and constraints on the production table