Python Pandas 数据框根据清除字符串值并分配给新列的函数修改列值

Question

我有一些数据要清理，其中一些键有六个前导零，我想去掉，如果键不以“ABC”结尾或不以“DEFG”结尾，那么我需要清除最后 3 个索引中的货币代码。 如果键不以前导零开头，则按原样返回键。

为了实现这一点，我编写了一个处理字符串的函数，如下所示：

def cleanAttainKey(dirtyAttainKey):

    if dirtyAttainKey[0] != "0":
        return dirtyAttainKey
    else:
        dirtyAttainKey = dirtyAttainKey.strip("0")

    if dirtyAttainKey[-3:] != "ABC" and dirtyAttainKey[-3:] != "DEFG":
        dirtyAttainKey =  dirtyAttainKey[:-3]
    cleanAttainKey = dirtyAttainKey
    return cleanAttainKey

现在我构建了一个虚拟数据框来测试它，但它报告错误：

数据框

df = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102]},
                  columns=["dirtyKey","amount"])

我需要在 df 中获得一个名为“cleanAttainKey”的新列，然后使用“cleanAttainKey”函数修改“dirtyKey”中的每个值，然后将清理过的键分配给新列“cleanAttainKey”，但似乎熊猫没有t 支持这种类型的修改。

# add a new column in df called cleanAttainKey
df['cleanAttainKey'] = ""
# I want to clean the keys and get into the new column of cleanAttainKey
dirtyAttainKeyList = df['dirtyKey'].tolist()
for i in range(len(df['cleanAttainKey'])):
    df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])

我收到以下错误消息：

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

结果应该与下面的 df2 相同：

df2 = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102],
                  'cleanAttainKey':["12345ABC","12345DEFG","23456DEFG"]},
                  columns=["dirtyKey","cleanAttainKey","amount"])
df2

有没有更好的方法来修改脏键并在 Pandas 中使用干净的键获取新列？ 谢谢

Answer 1

这是罪魁祸首：

df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])

当您使用数据框的提取时，Pandas 保留选择制作副本或查看的能力。 如果您只是读取数据并不重要，但这意味着您永远不应该修改它。

惯用的方法是使用loc （或iloc或[i]at ）：

df.loc[i, 'cleanAttainKey'] = cleanAttainKey(vpAttainKeyList[i])

（以上假设一个自然范围指数......）

Python Pandas 数据框根据清除字符串值并分配给新列的函数修改列值

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-03-11 08:22:04

Python Pandas 数据框根据清除字符串值并分配给新列的函数修改列值

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-03-11 08:22:04

解决方案1
1 已采纳 2020-03-11 08:22:04