熊猫：从另一列修改数据框中的值

Question

While splitting data into columns, there was some glitch, due to which I have got some noisy data.在将数据拆分成列时，出现了一些小故障，因此我得到了一些嘈杂的数据。

    site          code
    ---           ---
0   apple_123     45
1   apple_456     xy_33
2   facebook_123  24
3   google_123    NaN
4   google_123    pq_51

I need to clean the data, such that I get the following result:我需要清理数据，以便得到以下结果：

    site            code
    ---             ---
0   apple_123       45
1   apple_456_xy    33
2   facebook_123    24
3   google_123      NaN
4   google_123_pq   51

I have been able to obtain the rows that need to be modified, but am unable to progress further:我已经能够获得需要修改的行，但无法进一步进行：

import numpy as np
import pandas as pd

site = ['apple_123','apple_456','facebook_123','google_123','google_123']
code = [45,'xy_33',24,np.nan,'pq_51']
df = pd.DataFrame(list(zip(site,code)), columns=['site','code'])

df[(~df.code.astype(str).str.isdigit())&(~df.code.isna())]

Answer 1

Use Series.str.extract for get non numeric and numeric values to helper DataFrame and then processing each column separately - remove _ by Series.str.strip , add from right side by Series.radd and convert missing values to emty string, last add to code column, for second use Series.fillna for replace not mached values from 1 column to original:使用Series.str.extract为获取非数字和数值来帮助DataFrame ，然后分别处理每个列-删除_通过Series.str.strip ，从右侧添加Series.radd和转换遗漏值来emty串，最后加对列进行code ，第二次使用Series.fillna将Series.fillna的值从1列替换为原始值：

df1 = df.code.str.extract('(\D+)(\d+)')

df['site'] += df1[0].str.strip('_').radd('_').fillna('')
df['code'] = df1[1].fillna(df['code'])
print (df)
            site code
0      apple_123   45
1   apple_456_xy   33
2   facebook_123   24
3     google_123  NaN
4  google_123_pq   51

熊猫：从另一列修改数据框中的值

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-10-17 08:13:39

熊猫：从另一列修改数据框中的值

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-10-17 08:13:39

解决方案1
0 已采纳 2020-10-17 08:13:39