标准化数据框列中的值

Question

I have a dataframe df which looks like: 我有一个数据框df，看起来像：

id colour  response
 1   blue    curent 
 2    red   loaning
 3 yellow   current
 4  green      loan 
 5    red   currret
 6  green      loan

You can see the values in the response column are not uniform and I would like to get the to snap to a standardized set of responses. 您可以看到“响应”列中的值不统一，我希望将其捕捉到一组标准化的响应中。

I also have a validation list validate which looks like 我也有一个验证列表validate看起来像

validate
 current
    loan
transfer

I would like to standardise the response column in the df based on the first three characters in the entry against the validate list 我想根据验证列表中条目的前三个字符对df中的响应列进行标准化

So the eventual output would look like: 因此，最终输出将如下所示：

id colour  response
 1   blue   current
 2    red      loan
 3 yellow   current
 4  green      loan 
 5    red   current
 6  green      loan

have tried to use fnmatch 尝试使用fnmatch

pattern = 'cur*'
fnmatch.filter(df, pattern) = 'current'

but can't change the values in the df. 但无法更改df中的值。

If anyone could offer assistance it would be appreciated 如果有人可以提供帮助，将不胜感激

Thanks 谢谢

Answer 1

You could use map 你可以用map

In [3664]: mapping = dict(zip(s.str[:3], s))

In [3665]: df.response.str[:3].map(mapping)
Out[3665]:
0    current
1       loan
2    current
3       loan
4    current
5       loan
Name: response, dtype: object

In [3666]: df['response2'] = df.response.str[:3].map(mapping)

In [3667]: df
Out[3667]:
   id  colour response response2
0   1    blue   curent   current
1   2     red  loaning      loan
2   3  yellow  current   current
3   4   green     loan      loan
4   5     red  currret   current
5   6   green     loan      loan

Where s is series of validation values. 其中s是一系列验证值。

In [3650]: s
Out[3650]:
0     current
1        loan
2    transfer
Name: validate, dtype: object

Details 细节

In [3652]: mapping
Out[3652]: {'cur': 'current', 'loa': 'loan', 'tra': 'transfer'}

mapping can be series too mapping也可以是系列

In [3678]: pd.Series(s.str[:3].values, index=s.values)
Out[3678]:
current     cur
loan        loa
transfer    tra
dtype: object

Answer 2

Fuzzy match ? 模糊匹配？

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
a=[]
for x in df.response:
    a.append([process.extract(x, val.validate, limit=1)][0][0][0])
df['response2']=a
df
Out[867]: 
   id  colour response response2
0   1    blue   curent   current
1   2     red  loaning      loan
2   3  yellow  current   current
3   4   green     loan      loan
4   5     red  currret   current
5   6   green     loan      loan

标准化数据框列中的值

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-10-17 16:48:35

解决方案2
0 2017-10-17 17:15:33

标准化数据框列中的值

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-10-17 16:48:35

解决方案2 0 2017-10-17 17:15:33

解决方案1
2 已采纳 2017-10-17 16:48:35

解决方案2
0 2017-10-17 17:15:33