簡體   English   中英

Pandas DataFrame根據列表中的值插入列

[英]Pandas DataFrame insert column based on values in lists

這可能是一個簡單的問題。 但是我浪費了很多時間,沒有弄清楚這里發生了什么。 我想基於資源擴展將HTTP請求分類在Web日志文件中。 以下是我嘗試過的。

imgstr = ['.png','.gif','.jpeg','.jpg']     
docstr = [ '.pdf','.ppt','.doc' ]  
webstr = ['.html','.htm', '.asp', '.jsp', '.php', '.cgi', '.js','.css']
compressed = ['zip', 'rar', 'gzip', 'tar', 'gz', '7z']



def rtype(b):
    if any(x in b for x in imgstr):
        return 'A'
    elif any(x in b for x in docstr):
        return 'B'
    elif 'favicon.ico'in b:
        return 'C'
    elif 'robots.txt'in b:
        return 'D'
    elif 'GET / HTTP/1.1' in b:
        return 'E'
    elif any(x in b for x in webstr):
        return 'F'
    elif any(x in b for x in compressed):
        return 'G'
    else:
        return 'H'

df2['result'] = df2.Request.apply(rtype)

但是df2['result']只有'A'嗎? df2.Request數據類型為Object 我嘗試使用df2['Referer'] = df2['Referer'].astype(str) dtype仍然是Object 以下是前10 df2.Request

0,GET /index.php?lang=ta HTTP/1.1
1,GET /index.php?limitstart=25&lang=en HTTP/1.1
2,GET /index.php/ta/component/content/article/43 HTTP/1.1
3,GET /index.php/ta/component/content/article/39-test HTTP/1.1
4,GET /robots.txt HTTP/1.1
5,GET /robots.txt HTTP/1.1
6,GET /index.php/en/computer-security-feeds/15-computer-security/2-us-cert-cyber-security-alerts HTTP/1.1
7,GET /index.php/component/content/article/10-tips/59-use-firefox-more-safe HTTP/1.1
8,GET /robots.txt HTTP/1.1
9,GET /onlinerenew/ HTTP/1.1

我可能為此使用正則表達式。

import pandas as pd
import re

def categoriser(x):

if re.search('(.png|.gif|.jpeg|.jpg)', x):
    return 'A'
elif re.search('(.pdf|.ppt|.doc)', x):
    return 'B'
elif 'favicon.ico'in x:
    return 'C'
elif 'robots.txt'in x:
    return 'D'
elif 'GET / HTTP/1.1' in x:
    return 'E'
elif re.search('(.html|.htm|.asp|.jsp|.php|.cgi|.js|.css)', x):
    return 'F'
elif re.search('(zip|rar|gzip|tar|gz|7z)', x):
    return 'G'
else:
    return 'H'


string = """0,GET /index.php?lang=ta HTTP/1.1
1,GET /index.php?limitstart=25&lang=en HTTP/1.1
2,GET /index.php/ta/component/content/article/43 HTTP/1.1
3,GET /index.php/ta/component/content/article/39-test HTTP/1.1
4,GET /robots.txt HTTP/1.1
5,GET /robots.txt HTTP/1.1
6,GET /index.php/en/computer-security-feeds/15-computer-security/2-us-cert-cyber-security-alerts HTTP/1.1
7,GET /index.php/component/content/article/10-tips/59-use-firefox-more-safe HTTP/1.1
8,GET /robots.txt HTTP/1.1
9,GET /onlinerenew/ HTTP/1.1"""    

frame = pd.DataFrame([x.split(",") for x in string.split("\n")])

print frame.loc[:,1].apply(categoriser)

結果是:

0    F
1    F
2    F
3    F
4    D
5    D
6    F
7    F
8    D
9    H
Name: 1, dtype: object

那是你想要的嗎? 如果您下次可以包含所需的輸出,那就太好了:)帶有dtype:object的東西是,位於數據幀下面的numpy數組調用字符串和一堆其他東西對象……在這種情況下,它仍然是字符串: )

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM