[英]how can I make this SQL more easier?
date customer_name service_name price_paid
2021-01-01 Andrew Cable TV 5000
2021-02-02 Brad cabletv 5000
2021-03-03 Charlie Cable TV 5000
2021-02-05 Dan ISP 6000
2021-02-18 Eric ISP 6000
2021-10-09 Felix ISP 6000
2021-09-10 Gerald isp 6000
2022-03-10 Hubert Cable TV 5000
2022-04-12 Isaac i.s.p 6000
2022-04-15 Jason ISp
2022-05-23 Karen Cable T.V
2022-06-23 Leah ISP 6000
2022-05-17 Marie Cable TV 5000
2022-06-11 Norman ISP 6000
So i got this SQL output, as you can see my service_name is really messy.所以我得到了这个 SQL output,你可以看到我的 service_name 真的很乱。 I wanna make it clean and i found this example
我想让它干净,我找到了这个例子
SELECT name,
CASE WHEN department == 'Math' THEN 'Math'
ELSE upper(replace(replace(department, 'Information Technology', 'I.T'), 'it', 'I.T'))
END as department
FROM messy_df)
but when we use large data i think it will more harder becase we need replace one by one, can we make this more easier?但是当我们使用大数据时,我认为它会更难,因为我们需要一个一个地替换,我们可以让这更容易吗?
this is my code这是我的代码
sql_query = """
SELECT
strftime('%Y', date) 'year',
strftime('%m', date) 'month',
strftime('%d', date) 'date',
customer_name,
replace(replace(replace(replace(replace(service_name, 'cabletv', 'Cable TV'), 'Cable T.V', 'Cable TV'), 'i.s.p', 'ISP'), 'isp', 'ISP'),'ISp', 'ISP') service_name,
COALESCE(price_paid, (SELECT avg(price_paid) FROM df)) as 'price_paid'
FROM df
"""
sql_run(sql_query)
What you're doing is fine, given the state of the data.考虑到数据的 state,您所做的很好。
I would create a view我会创建一个视图
CREATE VIEW vdf as SELECT *, CASE ... FROM df
that at least includes, for each "munged" column, the original, so you can see the transformation.对于每个“munged”列,至少包括原始列,因此您可以看到转换。 The user/application will often want to see the unmodified form, too.
用户/应用程序通常也希望看到未修改的表单。
The best solution to this kind of problem (as with many problems) is prevention .此类问题(与许多问题一样)的最佳解决方案是预防。 The
service_name
column needs a constraint restricting it to a set of uniform values, so there's only one form of, say, "ISP". service_name
列需要将其限制为一组统一值的约束,因此只有一种形式,比如说“ISP”。 This is usually achieved via a lookup table and maybe a drop-down box in the application.这通常是通过查找表和应用程序中的下拉框来实现的。 Alternatively, your CASE clauses can be applied on input as part of INSERT.
或者,您的 CASE 子句可以作为 INSERT 的一部分应用于输入。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.