Delete duplicate FAL entries and references after TYPO3 6.2 upgrade
It may happen that you've run the FAL migration wizard for images twice during the upgrade from TYPO3 4.5 to 6.2. In this case, you end up with every image existing twice on the same content element.
Here is how we cleaned that up:
# Add temporary "deleted" field to sys_file
ALTER TABLE sys_file ADD deleted tinyint(4) NOT NULL DEFAULT '0';
# Create a copy of the table
CREATE TABLE sys_file_copy LIKE sys_file;
INSERT sys_file_copy SELECT * FROM sys_file;
# Update "sys_file" (see below)
UPDATE sys_file SET deleted=1
WHERE deleted=0 AND uid NOT IN (
SELECT MAX(uid) AS uid
FROM sys_file_copy
WHERE deleted=0
GROUP BY BINARY identifier
);
# Clean up in "sys_file_reference" table after "sys_file" update
UPDATE sys_file_reference SET deleted=1
WHERE deleted=0 AND uid_local IN (
SELECT uid FROM sys_file
WHERE deleted=1
);
# Create a copy of the table
CREATE TABLE sys_file_reference_copy LIKE sys_file_reference;
INSERT sys_file_reference_copy SELECT * FROM sys_file_reference;
# Update sys_file_reference (see below)
UPDATE sys_file_reference SET deleted=1
WHERE deleted=0 AND uid NOT IN (
SELECT MAX(uid) AS uid
FROM sys_file_reference_copy
WHERE deleted=0
GROUP BY pid,uid_local,uid_foreign,tablenames,fieldname
);
# Verify the result before you delete the data!
# Delete and clean up
DELETE FROM sys_file_reference WHERE deleted=1;
DELETE FROM sys_file WHERE deleted=1;
ALTER TABLE sys_file DROP deleted;
DROP TABLE sys_file_reference_copy;
DROP TABLE sys_file_copy;
Explanation of the update queries:
Update sys_file table
- select records with the same identifier
- using binary mode so that the search is case sensitive
- keep results with highest uid
- mark the rest as deleted
Update sysfilereference table
- select records with same pid,uidlocal,uidforeign,tablenames,fieldname
- keep results with the highest uid
- mark the rest as deleted
Written by Michael Stucki
Related protips
3 Responses
My database exploded when trying the above.
So I did:
SELECT n1.uid, n1.uid_foreign FROM sys_file_reference n1, sys_file_reference n2 WHERE n1.uid > n2.uid AND n1.uid <> n2.uid AND n1.uid_local = n2.uid_local AND n1.uid_foreign = n2.uid_foreign AND n1.tablenames = n2.tablenames;
And then:
UPDATE sys_file_reference n1, sys_file_reference n2 SET n1.deleted = 1 WHERE n1.uid > n2.uid AND n1.uid <> n2.uid AND n1.uid_local = n2.uid_local AND n1.uid_foreign = n2.uid_foreign AND n1.tablenames = n2.tablenames;
Or if you're brave and have backups:
DELETE n1 FROM sys_file_reference n1, sys_file_reference n2 WHERE n1.uid > n2.uid AND n1.uid <> n2.uid AND n1.uid_local = n2.uid_local AND n1.uid_foreign = n2.uid_foreign AND n1.tablenames = n2.tablenames;
Hi Michiel!
I'm not sure what didn't work for you. It certainly takes a while to process, however it is tested and works fine.
I would be careful with the queries you mentioned as I see quite a few issues:
- You don't clean up in sys_file.
- Your query doesn't check for the fieldname, which should be unique as well. I made a short check on one of my test sites where some of the duplicates refer to the field "media" and some to the "image" field. This can happen if a content element was of type "uploads" before but is now an image element. So unless you keep both entries in the reference table, you also need to check for the CType (uploads uses "media", image and textpic use "image").
- Don't forget to check the "deleted" field.
- "... WHERE n1.uid > n2.uid AND n1.uid <> n2.uid" can be shortened.
However, I see that the original queries that I suggested could be well optimized. Instead of creating temporary tables, a JOIN should work out well and be much faster. In any case, you need to make a connection between sysfile and sysfile_reference if you're cleaning up in both tables.
We tried it in a really big database and it lasts hours, so we changed the scripts to faster ones (and it works!):
CREATE TABLE sys_file_new LIKE sys_file;
INSERT sys_file_new SELECT tx_phpunit_is_dummy_record, MAX(uid) AS uid, pid, tstamp, last_indexed, missing, storage, type, metadata, identifier, identifier_hash, folder_hash, extension, mime_type, name, sha1, size, creation_date, modification_date FROM sys_file GROUP BY BINARY identifier;
SELECT * FROM sys_file_new;
RENAME TABLE sys_file TO sys_file_bak;
RENAME TABLE sys_file_new TO sys_file;
CREATE TABLE sys_file_reference_new LIKE sys_file_reference;
INSERT sys_file_reference_new SELECT tx_phpunit_is_dummy_record, MAX(uid) AS uid, pid, tstamp, crdate, cruser_id, sorting, deleted, hidden, t3ver_oid, t3ver_id, t3ver_wsid, t3ver_label, t3ver_state, t3ver_stage, t3ver_count, t3ver_tstamp, t3ver_move_id, t3_origuid, sys_language_uid, l10n_parent, l10n_diffsource, uid_local, uid_foreign, tablenames, fieldname, sorting_foreign, table_local, title, description, alternative, link, downloadname FROM sys_file_reference WHERE deleted=0 GROUP BY pid,uid_local,uid_foreign,tablenames,fieldname;
SELECT * FROM sys_file_reference_new;
RENAME TABLE sys_file_reference TO sys_file_reference_bak;
RENAME TABLE sys_file_reference_new TO sys_file_reference;
UPDATE sys_file_reference
LEFT OUTER JOIN sys_file ON (sys_file.uid = sys_file_reference.uid_local)
SET sys_file_reference.uid_local = sys_file_reference.uid_local + 1
WHERE sys_file.uid IS NULL;