繁体   English   中英

Pandas DataFrame根据列表中的值插入列

[英]Pandas DataFrame insert column based on values in lists

这可能是一个简单的问题。 但是我浪费了很多时间,没有弄清楚这里发生了什么。 我想基于资源扩展将HTTP请求分类在Web日志文件中。 以下是我尝试过的。

imgstr = ['.png','.gif','.jpeg','.jpg']     
docstr = [ '.pdf','.ppt','.doc' ]  
webstr = ['.html','.htm', '.asp', '.jsp', '.php', '.cgi', '.js','.css']
compressed = ['zip', 'rar', 'gzip', 'tar', 'gz', '7z']



def rtype(b):
    if any(x in b for x in imgstr):
        return 'A'
    elif any(x in b for x in docstr):
        return 'B'
    elif 'favicon.ico'in b:
        return 'C'
    elif 'robots.txt'in b:
        return 'D'
    elif 'GET / HTTP/1.1' in b:
        return 'E'
    elif any(x in b for x in webstr):
        return 'F'
    elif any(x in b for x in compressed):
        return 'G'
    else:
        return 'H'

df2['result'] = df2.Request.apply(rtype)

但是df2['result']只有'A'吗? df2.Request数据类型为Object 我尝试使用df2['Referer'] = df2['Referer'].astype(str) dtype仍然是Object 以下是前10 df2.Request

0,GET /index.php?lang=ta HTTP/1.1
1,GET /index.php?limitstart=25&lang=en HTTP/1.1
2,GET /index.php/ta/component/content/article/43 HTTP/1.1
3,GET /index.php/ta/component/content/article/39-test HTTP/1.1
4,GET /robots.txt HTTP/1.1
5,GET /robots.txt HTTP/1.1
6,GET /index.php/en/computer-security-feeds/15-computer-security/2-us-cert-cyber-security-alerts HTTP/1.1
7,GET /index.php/component/content/article/10-tips/59-use-firefox-more-safe HTTP/1.1
8,GET /robots.txt HTTP/1.1
9,GET /onlinerenew/ HTTP/1.1

我可能为此使用正则表达式。

import pandas as pd
import re

def categoriser(x):

if re.search('(.png|.gif|.jpeg|.jpg)', x):
    return 'A'
elif re.search('(.pdf|.ppt|.doc)', x):
    return 'B'
elif 'favicon.ico'in x:
    return 'C'
elif 'robots.txt'in x:
    return 'D'
elif 'GET / HTTP/1.1' in x:
    return 'E'
elif re.search('(.html|.htm|.asp|.jsp|.php|.cgi|.js|.css)', x):
    return 'F'
elif re.search('(zip|rar|gzip|tar|gz|7z)', x):
    return 'G'
else:
    return 'H'


string = """0,GET /index.php?lang=ta HTTP/1.1
1,GET /index.php?limitstart=25&lang=en HTTP/1.1
2,GET /index.php/ta/component/content/article/43 HTTP/1.1
3,GET /index.php/ta/component/content/article/39-test HTTP/1.1
4,GET /robots.txt HTTP/1.1
5,GET /robots.txt HTTP/1.1
6,GET /index.php/en/computer-security-feeds/15-computer-security/2-us-cert-cyber-security-alerts HTTP/1.1
7,GET /index.php/component/content/article/10-tips/59-use-firefox-more-safe HTTP/1.1
8,GET /robots.txt HTTP/1.1
9,GET /onlinerenew/ HTTP/1.1"""    

frame = pd.DataFrame([x.split(",") for x in string.split("\n")])

print frame.loc[:,1].apply(categoriser)

结果是:

0    F
1    F
2    F
3    F
4    D
5    D
6    F
7    F
8    D
9    H
Name: 1, dtype: object

那是你想要的吗? 如果您下次可以包含所需的输出,那就太好了:)带有dtype:object的东西是,位于数据帧下面的numpy数组调用字符串和一堆其他东西对象……在这种情况下,它仍然是字符串: )

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM