问题:
I have this table:
Assume that "florio" is a city contained somewhere in the AllLocationTerms array column.
How do I remove "florio" when it exists on my list of loc...
可以将文章内容翻译成中文,广告屏蔽插件会导致该功能失效:
问题:
I have this table:
Assume that "florio" is a city contained somewhere in the AllLocationTerms array column.
How do I remove "florio" when it exists on my list of locations in AllLocationTerms array column?
Basically, I want to remove all matching items in AllLocationTerms from "Query" column.
It can happen that there are 2 or more words - "new york apartments" as Query and "new", "york" in the array. The outcome should be "apartments" in this case.
回答1:
Below is to address use case like yours when you need to check 800,000 rows against the location list array which has around 40k items
So, 40K items are definitely too much for being used to construct regular expression as it is in my previous answer.
So, to address this, I propose to rather split your query string into separate words preserving position number - then exclude those which are terms by left joining and finally assemble the survived words back to string
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'florio management apartments' query, 1 clicks, ['battle','creek','iowa','florio'] allLocationTerms UNION ALL
SELECT 'florio creek management iowa apartments' query, 1 clicks, ['battle','creek','iowa','florio'] allLocationTerms
)
SELECT *,
(
SELECT STRING_AGG(word, ' ' ORDER BY pos)
FROM (
SELECT word, MIN(pos) pos
FROM UNNEST(SPLIT(query, ' ')) word WITH OFFSET AS pos
LEFT JOIN UNNEST(allLocationTerms) term
ON word = term
GROUP BY word
HAVING COUNT(DISTINCT term) = 0
)
) modified_query
FROM `project.dataset.table`
回答2:
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
REGEXP_REPLACE(query,
(SELECT CONCAT('\b', STRING_AGG(term, '\b|\b'), '\b') FROM UNNEST(allLocationTerms) term),
'') modified_query
FROM `project.dataset.table`
you can test, play with above using dummy data as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'florio management apartments' query, 1 clicks, ['battle','creek','iowa','florio'] allLocationTerms UNION ALL
SELECT 'florio creek management iowa apartments' query, 1 clicks, ['battle','creek','iowa','florio'] allLocationTerms
)
SELECT *,
REGEXP_REPLACE(query,
(SELECT CONCAT('\b', STRING_AGG(term, '\b|\b'), '\b') FROM UNNEST(allLocationTerms) term),
'') modified_query
FROM `project.dataset.table`
result is
Row query clicks allLocationTerms modified_query
1 florio management apartments 1 battle management apartments
creek
iowa
florio
2 florio creek management iowa apartments 1 battle management apartments
creek
iowa
florio