简体   繁体   English

MySQL和Python:将字符串值转换为int,浮点数或什么?

[英]MySQL and Python: Convert string value to int, to float or what?

I have a string that contains salary information in the following way: 我有一个字符串,其中包含以下列方式的工资信息:

salaryMixed = "£25,000 - £30,000"

Sometimes it will look like this: 有时它看起来像这样:

salaryMixed = "EUR25,000 - EUR30,000"

And others times like this: 其他时间是这样的:

salaryMixed = "£37.50 - £50.00"

What I want to do is to remove all characters but the numeric values and then split the two values so as to place them into their own respective variables that reflect low banding and high banding. 我想要做的是删除所有字符,但删除数值,然后拆分这两个值,以便将它们放入各自的变量,反映低条带和高条带。 So far I have: 到目前为止,我有:

if salaryMixed.find('£')!=-1: # found £ char
    salaryMixed = salaryMixed.replace("£", "")
if salaryMixed.find('-')!=-1: # found hyphen
    salaryMixed = salaryMixed.replace("-", "")
if salaryMixed.find(',')!=-1: # found comma
    salaryMixed = salaryMixed.replace(",", "")
if salaryMixed.find('EUR')!=-1: # found EUR
    salaryMixed = salaryMixed.replace("EUR", "")
salaryMixed = re.sub('\s{2,}', ' ', salaryMixed) # to remove multiple space

if len(salaryList) == 1:
    salaryLow = map(int, 0) in salaryList
    salaryHigh = 00000
else:
    salaryLow = int(salaryList.index(1))
    salaryHigh = int(salaryList.index(2))

But I am stumped with how to split the two values up, and also how to handle the decimal point when salaryMixed isn't an annual salary but rather per hour as in the case of salaryMixed = "£37.50 - £50.00" because isn't that a float? 但是我很难理解如何将两个值分开,以及当salaryMixed不是年薪时如何处理小数点,而是每小时,如在salaryMixed = "£37.50 - £50.00"的情况下,因为isn'那个浮子?

I am wanting to store this information in a MySQL DB later on in the code but I have described the table as: 我想在稍后的代码中将此信息存储在MySQL数据库中,但我将该表描述为:

CREATE TABLE jobs(
   job_id INT NOT NULL AUTO_INCREMENT,
   job_title VARCHAR(300) NOT NULL,
   job_salary_low INT(25),
   job_salary_high INT(25),
   PRIMARY KEY ( job_id )
);

What is the best approach here? 这里最好的方法是什么? Thanks. 谢谢。

What I want to do is to remove all characters but the numeric values and then split the two values so as to place them into their own respective variables that reflect low banding and high banding. 我想要做的是删除所有字符,但删除数值,然后拆分这两个值,以便将它们放入各自的变量,反映低条带和高条带。 So far I have: 到目前为止,我有:

Ok taking this one step at a time. 好的,一次一步。 Remove all the characters but the numeric values (Better keep spaces and periods too) 删除所有字符但数字值(更好地保留空格和句点)

>>> testcases =  ["£25,000 - £30,000", "EUR25,000 - EUR30,000", "£37.50 - £50.00"]
>>> res = [''.join(x for x in tc if x.isdigit() or x.isspace() or x == '.') for tc in testcases]
>>> res
['25000  30000', '25000  30000', '37.50  50.00']

ok, now split them 好的,现在将它们分开

>>> res = [x.split() for x in res]
>>> res
[['25000', '30000'], ['25000', '30000'], ['37.50', '50.00']]

Convert to floats (Decimal might be better) 转换为浮点数(十进制可能更好)

>>> res = [[float(j) for j in i] for i in res]>>> res
[[25000.0, 30000.0], [25000.0, 30000.0], [37.5, 50.0]]

Put in separate variables 放入单独的变量

>>> for low, high in res:
...     print (low, high)
... 
25000.0 30000.0
25000.0 30000.0
37.5 50.0

regex as suggested by @Patashu is the easy/lazy way to do it though 正如@Patashu所建议的正则表达式是简单/懒惰的方式

This is a good case for a regular expression from the python re module. 这是python re模块的正则表达式的一个很好的例子。 And you'll probably want to upcast the hourly rates to annual (assuming you have a consistent average hourly 而且你可能希望将每小时费率上调到年度(假设你每小时平均一致)

import re

def salary_band(val):
    currency = 'EUR' if 'EUR' in val else 'GBP'
    numbers = re.findall("[0-9.\,]*", val) # this will have a bunch of empty entries and two numbers
    numbers = [i.replace(",","") for i in numbers if i] # filter out empty strings, remove commas
    numbers = map(float, numbers) # convert to floats
    annual = lambda p: int(p) if p > 2000 else int( p * 1800) # your number here...
    return currency, map(annual, numbers)

print salary_band ( "gbp37.50 - gbp50.00")
print salary_band ( "EUR25,000 - EUR30,000")
>> ('GBP', [75000, 100000])
>> ('EUR', [25000, 30000])

Here i'm returning the currency type and the high/low numbers as a tuple - you can unpack it easily into your table 在这里,我将货币类型和高/低数字作为元组返回 - 您可以轻松地将其打包到您的表中

for storing the values in db, you can use MySQLdb library in python.It's easy to use and will store al your data to database. 为了在db中存储值,你可以在python中使用MySQLdb库。它易于使用,并将你的数据存储到数据库中。 Here check it out. 在这里看看。

You can install it by apt-get install python-mysqldb 您可以通过apt-get install python-mysqldb安装它

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM