I am using the following script to copy data from a CSV file that is updated daily into a Mysql Database.
csv_data = csv.reader(open('test.csv'))
next(csv_data, None)
for row in csv_data:
with connection.cursor() as cursor:
cursor.execute(("INSERT INTO test(`1` ,`2`,`3` .......) VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"), (row[0]........))
The CSV currently has over 40,000 rows and will continue to grow meaning it will take hours to do.
I know I can add a unique identifier onto the database the will stop duplicated and do INSERT IGNORE
to skip over it but is there anything else I can do to speed the process?