Double counts when left joining in PostgreSQL

问题: I have the following tables: users which have the following columns: id: INT name: VARCHAR boss_id: INT bosses which have the following columns: id: INT name: VARC...

问题:

I have the following tables:

users which have the following columns:

id: INT
name: VARCHAR
boss_id: INT

bosses which have the following columns:

id: INT
name: VARCHAR

messages which have the following columns:

author_id: INT (reference to users)
body: VARCHAR
type: VARCHAR

messages_targets which have the following columns:

user_id: INT (reference to users)
message_id: INT (reference to messages)

Now, I have the following query that is properly returning me for each of the bosses, the percentage of users who have received at least one message of type 'urgent' This is how I have done the query:

SELECT (COUNT(DISTINCT CASE WHEN messages.type = 'urgent' THEN users.id END)::float / NULLIF(COUNT(DISTINCT users.id)::float, 0)) * 100,
bosses.id
FROM bosses
LEFT JOIN users ON users.boss_id = bosses.id
LEFT JOIN messages_targets ON messages_targets.user_id = users.id
LEFT JOIN messages ON messages.id = messages_targets.message_id
GROUP BY bosses.id

Now, I want to modify that query so that it also returns me, the count of urgent messages that the users have authored grouped by their boss. So, I have tried this:

SELECT (COUNT(DISTINCT CASE WHEN messages.type = 'urgent' THEN users.id END)::float / NULLIF(COUNT(DISTINCT users.id)::float, 0)) * 100 as percentage_received,
COUNT(CASE WHEN authored_messages.type = 'urgent' THEN 1 END) authored_messages_count
bosses.id
FROM bosses
LEFT JOIN users ON users.boss_id = bosses.id
LEFT JOIN messages_targets ON messages_targets.user_id = users.id
LEFT JOIN messages ON messages.id = messages_targets.message_id
LEFT JOIN messages authored_messages ON messages.author_id = users.id
GROUP BY bosses.id

But this is not working. It seems is double counting some data.

Here is some sample data, and following on what I would expect:

bosses (id, name)
1, John
2, Charles

users (id, name, boss_id)
1, Mai, 1
2, Donald, 1
3, Denver, 2

messages (author_id, body, type)
1, 'message from Mai to Donald', 'urgent'
2, 'message from Donald to Denver', 'normal'
3, 'message from Denver to Mai', 'urgent'
4, 'message from Mai to Donald', 'urgent'

messages_targets (user_id, message_id)
2, 1
3, 2
1, 3 
2, 4

I would expect to get the following:

boss_id, percentage_received, authored_messages

1, 100, 2 # (Both Mai and Donald received urgent messages, and in total there were 2 urgent messages sent)
2, 0, 1 # (Denver did not receive any urgent messages, but he sent one message)

回答1:

Try the following query. It keeps the two aggregates separate so their joins should not influence each other

SELECT 
    (
        SELECT 
           COUNT(DISTINCT CASE WHEN messages.type = 'urgent' THEN users.id END)::float / 
           NULLIF(COUNT(DISTINCT users.id)::float, 0)) * 100 
        FROM users
        JOIN messages_targets ON messages_targets.user_id = users.id
        JOIN messages ON messages.id = messages_targets.message_id
        WHERE users.boss_id = bosses.id
    ) percentage_received,
    (
        SELECT 
            COUNT(CASE WHEN messages.type = 'urgent' THEN 1 END) authored_messages_count
        FROM users
        JOIN messages_targets ON messages_targets.user_id = users.id
        JOIN messages ON messages.author_id = users.id
        WHERE users.boss_id = bosses.id
    ) authored_messages_count
    bosses.id
FROM bosses
  • 发表于 2019-01-16 00:42
  • 阅读 ( 185 )
  • 分类:网络文章

条评论

请先 登录 后评论
不写代码的码农
小编

篇文章

作家榜 »

  1. 小编 文章
返回顶部
部分文章转自于网络,若有侵权请联系我们删除