简体   繁体   English

将解析函数应用于 Pandas DataFrame

[英]Apply a Parsing Function to Pandas DataFrame

I have the following DF:我有以下 DF:

pd.DataFrame({'Data': ['Nov, 2018', '20 Sep 2019\xa0android-3','12 Nov 2019android-3', '11 Jun 2019roku-3\xa011 Sep 2019', 
                       '11 Jun 2019roku-3\xa011 Sep 2019', '06 Jan 2020\xa0android-3', '19 Dec 2019\xa0android-3',
                       '12 Nov 2019\xa0apple-4', '22 Nov 2019\xa0apple-4', '11 Jul 2019\xa0x1-2']})

I am trying to create a second column that consists of only the platform in each row without the dates.我正在尝试创建第二列,该列仅包含每行中的平台而没有日期。 To do this, I have a function called extract_date() :为此,我有一个名为extract_date()的函数:

def extract_date(date):
    val  = re.findall('\d{2} \w{3} \d{4}', date)
    if len(val) == 1:
        return val[0]
    else:
        return val

When I run this function on an individual string, I am able to get the result I want:当我在单个字符串上运行这个函数时,我能够得到我想要的结果:

s = '27 Feb 2020 roku-5.002 Mar 2020 roku-5.0.1'
mydict = dict.fromkeys(extract_date(s), '')
for k, v in mydict.items():
    s = s.replace(k, v).strip()

'roku-5.0 roku-5.0.1'

However, when I try to apply it to the Data column I don't get the same results:但是,当我尝试将其应用于 Data 列时,我没有得到相同的结果:

def strip_dates(x):
    if type(x) == float:
        return x
    else:
        mydict = dict.fromkeys(extract_date(x), '')
        for k, v in mydict.items():
            return x.replace(k, v).strip()

df['Data Text'] = df.apply(lambda row: strip_dates(row['Data']), axis=1)


                                 Data                Data Text
0                           Nov, 2018                     None
1               20 Sep 2019 android-3      0 Sep 019 android-3
2                12 Nov 2019android-3       2 Nov 209android-3
3       11 Jun 2019roku-3 11 Sep 2019       roku-3 11 Sep 2019
4       11 Jun 2019roku-3 11 Sep 2019       roku-3 11 Sep 2019

Can anybody tell me what is wrong with my approach in applying the function?谁能告诉我我应用该功能的方法有什么问题? Thanks.谢谢。

In your function:在您的功能中:

def strip_dates(x):
    if type(x) == float:
        return x
    else:
        mydict = dict.fromkeys(extract_date(x), '')
        for k, v in mydict.items():
            return x.replace(k, v).strip()

You immediately return in the first loop over the items of mydict dictionary:您立即在mydict字典的第一个循环中返回:

return x.replace(k, v).strip()

Change it to:将其更改为:

def strip_dates(x):
    if type(x) == float:
        return x
    else:
        mydict = dict.fromkeys(extract_date(x), '')
        s = str(x)
        for k, v in mydict.items():
             s = s.replace(k, v).strip()
        return s

As you can see I reused the line from your function which you changed and hence s = str(x) .如您所见,我重用了您更改的函数中的行,因此s = str(x)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM