Postgresql - Correcting bad logging data
When bad logging data from different versions of an application come in, we want to force update them to a unified value. For example:
Old iOS app sends source = 'ios', partner = 'viki'
New iOS app sends app_id = '10004a'
We want to update the data in table (app_id, source, partner) to values ('10004a', 'ios', 'viki');
CREATE TABLE logging(
app_id VARCHAR(15),
source VARCHAR(50),
partner VARCHAR(50)
);
Question: Which is faster, performing 1 update or 2 updates?
1 update:
UPDATE logging M
SET source = 'ios',
partner = 'viki',
app_id = '10004a'
WHERE app_id = '10004a'
OR (source = 'ios' AND partner = 'viki');
2 updates:
UPDATE logging M
SET source = 'ios',
partner = 'viki'
WHERE app_id = '10004a'
AND (source != 'ios' OR partner != 'viki');
UPDATE logging M
SET app_id = '10004a'
WHERE (source = 'ios' AND partner = 'viki')
AND app_id != '10004a';
Answer: With indexes, both are equally about the same. Possibly because:
It requires just 1 scan for the first.
It requires less assignment operations for the second.
With 2 updates, subsequent rerun will be much faster because it does not have to perform assignments on already-correct rows.
The Test:
-- Creating 2 identical test tables
CREATE TABLE logging_1(
app_id VARCHAR(15),
source VARCHAR(50),
partner VARCHAR(50)
);
INSERT INTO logging_1 SELECT '10004a', '', '' FROM generate_series(1, 1000000);
INSERT INTO logging_1 SELECT '', 'ios', 'viki' FROM generate_series(1, 1000000);
CREATE INDEX ON logging_1(app_id);
CREATE INDEX ON logging_1(source);
CREATE INDEX ON logging_1(partner);
CREATE TABLE logging_2(
app_id VARCHAR(15),
source VARCHAR(50),
partner VARCHAR(50)
);
INSERT INTO logging_2 SELECT '10004a', '', '' FROM generate_series(1, 1000000);
INSERT INTO logging_2 SELECT '', 'ios', 'viki' FROM generate_series(1, 1000000);
CREATE INDEX ON logging_2(app_id);
CREATE INDEX ON logging_2(source);
CREATE INDEX ON logging_2(partner);
-- 1 update
UPDATE logging_1
SET source = 'ios',
partner = 'viki',
app_id = '10004a'
WHERE app_id = '10004a'
OR (source = 'ios' AND partner = 'viki');
# UPDATE 2000000
# Time: 16790.319 ms
-- 2 updates
UPDATE logging_2
SET source = 'ios',
partner = 'viki'
WHERE app_id = '10004a'
AND (source != 'ios' OR partner != 'viki');
# UPDATE 1000000
# Time: 7591.592 ms
UPDATE logging_2
SET app_id = '10004a'
WHERE (source = 'ios' AND partner = 'viki')
AND app_id != '10004a';
# UPDATE 1000000
# Time: 10357.224 ms
Written by Huy Nguyen
Related protips
Have a fresh tip? Share with Coderwall community!
Post
Post a tip
Best
#Logging
Authors
Sponsored by #native_company# — Learn More
#native_title#
#native_desc#