简体   繁体   English

我怎样才能使这个 SQL 更容易?

[英]how can I make this SQL more easier?

date    customer_name   service_name    price_paid
2021-01-01  Andrew      Cable TV        5000
2021-02-02  Brad        cabletv         5000
2021-03-03  Charlie     Cable TV        5000
2021-02-05  Dan         ISP             6000
2021-02-18  Eric        ISP             6000
2021-10-09  Felix       ISP             6000
2021-09-10  Gerald      isp             6000
2022-03-10  Hubert      Cable TV        5000
2022-04-12  Isaac       i.s.p           6000
2022-04-15  Jason       ISp 
2022-05-23  Karen       Cable T.V   
2022-06-23  Leah        ISP             6000
2022-05-17  Marie       Cable TV        5000
2022-06-11  Norman      ISP             6000

So i got this SQL output, as you can see my service_name is really messy.所以我得到了这个 SQL output,你可以看到我的 service_name 真的很乱。 I wanna make it clean and i found this example我想让它干净,我找到了这个例子

SELECT name, 
    CASE WHEN department == 'Math' THEN 'Math'
    ELSE upper(replace(replace(department, 'Information Technology', 'I.T'), 'it', 'I.T')) 
    END as department
FROM messy_df)

but when we use large data i think it will more harder becase we need replace one by one, can we make this more easier?但是当我们使用大数据时,我认为它会更难,因为我们需要一个一个地替换,我们可以让这更容易吗?

this is my code这是我的代码

sql_query = """
SELECT 
  strftime('%Y', date) 'year',
  strftime('%m', date) 'month',
  strftime('%d', date) 'date',
  customer_name, 
  replace(replace(replace(replace(replace(service_name, 'cabletv', 'Cable TV'), 'Cable T.V', 'Cable TV'), 'i.s.p', 'ISP'), 'isp', 'ISP'),'ISp', 'ISP') service_name,
  COALESCE(price_paid, (SELECT avg(price_paid) FROM df)) as 'price_paid'
FROM df
"""

sql_run(sql_query)

What you're doing is fine, given the state of the data.考虑到数据的 state,您所做的很好。

I would create a view我会创建一个视图

CREATE VIEW vdf as SELECT *, CASE ... FROM df

that at least includes, for each "munged" column, the original, so you can see the transformation.对于每个“munged”列,至少包括原始列,因此您可以看到转换。 The user/application will often want to see the unmodified form, too.用户/应用程序通常也希望看到未修改的表单。

The best solution to this kind of problem (as with many problems) is prevention .此类问题(与许多问题一样)的最佳解决方案是预防 The service_name column needs a constraint restricting it to a set of uniform values, so there's only one form of, say, "ISP". service_name列需要将其限制为一组统一值的约束,因此只有一种形式,比如说“ISP”。 This is usually achieved via a lookup table and maybe a drop-down box in the application.这通常是通过查找表和应用程序中的下拉框来实现的。 Alternatively, your CASE clauses can be applied on input as part of INSERT.或者,您的 CASE 子句可以作为 INSERT 的一部分应用于输入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM