Df:
Id Product
1 Milk1256 Pack 10x3
2 Cleaner#45 Pack 13x4s
3 Milk 45 Small 1X30m
4 Cleaner #1379 Small 75s
5 Cleaner Small 4.45M
I need to create a new column based on the product column. Basically I want multiply the string if it is written as 10X3 my new column would be 30 else the value which has unit like [s,m,...]
Df_Output:
Id Product Vol
1 Milk1256 Pack 10x3 30
2 Cleaner#45 Pack 13x4s 52
3 Milk 45 Small 1X30m 30
4 Cleaner #1379 Small 75s 75
5 Cleaner Small 4.45M 4.45
Use str.extract
and a regex to get the last number(s), fillna
with 1 if only one number and get the product:
regex = r'(?:(\d+)[Xx])?(\d+)\D*$'
df['Vol'] = (df['Product'].str.extract(regex)
.fillna(1).astype(float)
.prod(axis=1)
)
Output:
Id Product Vol
0 1 Milk1256 Pack 10x3 30
1 2 Cleaner#45 Pack 13x4s 52
2 3 Milk 45 Small 1X30m 30
3 4 Cleaner #1379 Small 75s 75
4 5 Cleaner Small 45M 45
How the regex works:
(?:(\d+)[Xx])? # optionally capture a number followed by "x" or "X"
(\d+) # capture last number
\D*$ # anything not digits at the end of the string
Matching decimal numbers: regex = r'(?:(\d+(?:\.\d+)?)[Xx])?(\d+(?:\.\d+)?)\D*$'
Matching the AxB or A{M,m,s} (explicit units) format: regex = r'(\d+)[Xx](\d+)|(\d+(?=[Mms]\b))'
Example with the decimal numbers regex:
Id Product Vol
0 1 Milk1256 Pack 10x3.33 33.30
1 2 Cleaner#45 Pack 13x4s 52.00
2 3 Milk 45 Small 1.5X30m 45.00
3 4 Cleaner #1379 Small 75s 75.00
4 5 Cleaner Small 4.45M 4.45
Another solution with traditional for-loop
import pandas as pd
import string
data = [['1','Milk1256 Pack 4.4x3'],['2','Cleaner#45 Pack 13x4s']]
def find_value(str):
# get all lower case and upper case alphabets except x
lc = list(string.ascii_lowercase.replace('x',''))
uc = string.ascii_uppercase.replace('X','')
st = str.split(' ')
st1=''
# from the 3rd column in the df, remove all chars except x
for i in st[2]:
if i not in lc and i not in uc:
st1+=i
a=''
b=''
f=0
# find the two values to be multiplied
for i in st1:
if i != 'x' and f==0:
a+=i
elif i=='x':
f=1
else:
b+=i
# if there is no second number, multiply by 1
if b=='':
b=int('1')
return float(a)*float(b)
df = pd.DataFrame(data, columns = ['id', 'product'])
df['value'] = df['product'].apply(find_value)
print(df)
output
id product value
0 1 Milk1256 Pack 4.4x3 13.2
1 2 Cleaner#45 Pack 13x4s 52.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.