Pivot table with multiple value columns

问题: I have a Postgres table with products data from different manufacturers, here the simplified table structure: CREATE TABLE test_table ( sku text, manufa...

问题:

I have a Postgres table with products data from different manufacturers, here the simplified table structure:

CREATE TABLE test_table (
  sku               text,
  manufacturer_name text,
  price             double precision,
  stock             int
);

INSERT INTO test_table
VALUES ('sku1', 'Manufacturer1', 110.00, 22),
       ('sku1', 'Manufacturer2', 120.00, 15),
       ('sku1', 'Manufacturer3', 130.00, 1),
       ('sku1', 'Manufacturer3', 30.00, 11),
       ('sku2', 'Manufacturer1', 10.00, 2),
       ('sku2', 'Manufacturer2', 9.00,  3),
       ('sku3', 'Manufacturer2', 21.00, 3),
       ('sku3', 'Manufacturer2', 1.00, 7),
       ('sku3', 'Manufacturer3', 19.00, 5);

I need to output each Manufacturer for each sku but if there are several identical manufacturers for the same sku I need to select the Manufacturer with the lowest price (note that I also need to include 'stock' column), here desired results:

| sku  | man1_price | man1_stock | man2_price | man2_stock | man3_price | man3_stock |
|------|------------|------------|------------|------------|------------|------------|
| sku1 | 110.0      | 22         | 120.0      | 15         | 30.0       | 11         |
| sku2 | 10.0       | 2          | 9.0        | 3          |            |            |
| sku3 |            |            | 1.0        | 7          | 19.0       | 5          |

I tried to use Postgres crosstab():

SELECT *
FROM crosstab('SELECT sku, manufacturer_name, price
              FROM test_table
              ORDER BY 1,2',
              $$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$
       )
       AS ct (sku text, "man1_price" double precision,
              "man2_price" double precision,
              "man3_price" double precision
    );

But this produces a table with only one price column. And I didn't find a way to include the stock column.

I also tried to use conditional aggregation:

SELECT sku,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN price END) as man1_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN stock END) as man1_stock,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN price END) as man2_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN stock END) as man2_stock,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN price END) as man3_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN stock END) as man3_stock
FROM test_table
GROUP BY sku
ORDER BY sku

And this query also doesn't work in my case - it simply selects min stock level - but if there are few the same Manufacturers for the same sku but with different prices/stocks - this query select min price from one manufacturer and min stock from another.

How can I output each manufacturer's price and corresponding stock from that table?

P.S. Thank you all for such helpful answers. My Postgres table is rather small - there no more than 15k of products, (I don't know if such numbers can be useful for proper comparing) but since Erwin Brandstetter asked to compare different queries performance I ran 3 queries with EXPLAIN ANALYZE, here is their execution time:

Erwin Brandstetter query:        400 - 450 ms 
Kjetil S query:                  250 - 300 ms
Gordon Linoff query:             200 - 250 ms
a_horse_with_no_name query:      250 - 300 ms

Again - I'm not sure if those numbers can be useful as a reference. For my case, I chose the combined version of Kjetil S and Gordon Linoff queries but Erwin Brandstetter and a_horse_with_no_name variants are also very useful and interesting. It's worth noting that if my table in the future would end up having more then few Manufacturers - adjusting query and typing their names each time would be tiresome - and hence the query from a_horse_with_no_name answer would be the most convenient to use.


回答1:

Your last select almost works. But you should add a where condition where rows with non-minimum prices per sku per manufacturer are removed. This produces your expected result:

select
  sku,
  min( case when manufacturer_name='Manufacturer1' then price end ) man1_price,
  min( case when manufacturer_name='Manufacturer1' then stock end ) man1_stock,
  min( case when manufacturer_name='Manufacturer2' then price end ) man2_price,
  min( case when manufacturer_name='Manufacturer2' then stock end ) man2_stock,
  min( case when manufacturer_name='Manufacturer3' then price end ) man3_price,
  min( case when manufacturer_name='Manufacturer3' then stock end ) man3_stock
from test_table t
where not exists (
    select 1 from test_table
    where sku=t.sku
    and manufacturer_name=t.manufacturer_name
    and price<t.price
)
group by sku
order by 1;

回答2:

I find using a JSON result much easier these days then using a complicated pivot. Producing a single aggregated JSON value doesn't break the inherent restriction of SQL that the number of columns must be known before the query is executed (and must be the same for all rows).

You could using something like this:

select sku, 
       jsonb_object_agg(manufacturer_name, 
                          jsonb_build_object('price', price, 'stock', stock, 'isMinPrice', price = min_price)) as price_info
from (
  select sku, 
         manufacturer_name,
         price, 
         min(price) over (partition by sku) as min_price,
         stock
  from test_table
) t
group by sku;

The above returns the following result using your sample data:

sku  | price_info                                                                                                                                                                                             
-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sku1 | {"Manufacturer1": {"price": 110, "stock": 22, "isMinPrice": false}, "Manufacturer2": {"price": 120, "stock": 15, "isMinPrice": false}, "Manufacturer3": {"price": 30, "stock": 11, "isMinPrice": true}}
sku2 | {"Manufacturer1": {"price": 10, "stock": 2, "isMinPrice": false}, "Manufacturer2": {"price": 9, "stock": 3, "isMinPrice": true}}                                                                       
sku3 | {"Manufacturer2": {"price": 1, "stock": 7, "isMinPrice": true}, "Manufacturer3": {"price": 19, "stock": 5, "isMinPrice": false}}                                                                       

回答3:

I would use distinct on to limit the data to one manufacturer to one price. And I like the filter functionality in Postgres. So:

select sku,
       max(price) filter (where manufacturer_name = 'Manufacturer1') as man1_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer1') as man1_stock,
       max(price) filter (where manufacturer_name = 'Manufacturer2') as man2_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer2') as man2_stock,
       max(price) filter (where manufacturer_name = 'Manufacturer3') as man3_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer3') as man3_stock
from (select distinct on (manufacturer_name, sku) t.*
      from test_table t
      order by manufacturer_name, sku, price
     ) t
group by sku
order by sku;

回答4:

crosstab() must provide a static column definition list. Your 2nd parameter:

$$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$

... provides a dynamic list of values that would require a dynamic column definition list. That's not going to work - except by incidence.

The core problem of your task is that crosstab() expects a single value column from the query in its first parameter. But you want to process two value columns per row (price and stock).

One way around this is to pack multiple values in a composite type and extract values in the outer SELECT.

Create a composite type once:

CREATE TYPE price_stock AS (price float8, stock int);

A temporary table or view also serves the purpose.
Then:

SELECT sku
     , (man1).price, (man1).stock
     , (man2).price, (man2).stock
     , (man3).price, (man3).stock
FROM   crosstab(
   'SELECT sku, manufacturer_name, (price, stock)::price_stock
    FROM   test_table
    ORDER  BY 1,2'
  , $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
    )
       AS ct (sku text
            , man1 price_stock
            , man2 price_stock
            , man3 price_stock
    );

For a quick test, or if the row of your underlying table isn't too wide, you can also just use its row type, without creating a custom type:

SELECT sku
     , (man1).price, (man1).stock
     , (man2).price, (man2).stock
     , (man3).price, (man3).stock
FROM   crosstab(
   'SELECT sku, manufacturer_name, t
    FROM   test_table t
    ORDER  BY 1,2'
  , $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
    )
       AS ct (sku text
            , man1 test_table
            , man2 test_table
            , man3 test_table
    );

db<>fiddle here

Related:

  • 发表于 2019-03-27 02:41
  • 阅读 ( 210 )
  • 分类:sof

条评论

请先 登录 后评论
不写代码的码农
小编

篇文章

作家榜 »

  1. 小编 文章
返回顶部
部分文章转自于网络,若有侵权请联系我们删除