![](/img/trans.png)
[英]new column in pandas DataFrame based on unique values (lists) of an existing column
[英]Pandas DataFrame insert column based on values in lists
這可能是一個簡單的問題。 但是我浪費了很多時間,沒有弄清楚這里發生了什么。 我想基於資源擴展將HTTP請求分類在Web日志文件中。 以下是我嘗試過的。
imgstr = ['.png','.gif','.jpeg','.jpg']
docstr = [ '.pdf','.ppt','.doc' ]
webstr = ['.html','.htm', '.asp', '.jsp', '.php', '.cgi', '.js','.css']
compressed = ['zip', 'rar', 'gzip', 'tar', 'gz', '7z']
def rtype(b):
if any(x in b for x in imgstr):
return 'A'
elif any(x in b for x in docstr):
return 'B'
elif 'favicon.ico'in b:
return 'C'
elif 'robots.txt'in b:
return 'D'
elif 'GET / HTTP/1.1' in b:
return 'E'
elif any(x in b for x in webstr):
return 'F'
elif any(x in b for x in compressed):
return 'G'
else:
return 'H'
df2['result'] = df2.Request.apply(rtype)
但是df2['result']
只有'A'
嗎? df2.Request
數據類型為Object
。 我嘗試使用df2['Referer'] = df2['Referer'].astype(str)
。 dtype仍然是Object
。 以下是前10 df2.Request
。
0,GET /index.php?lang=ta HTTP/1.1
1,GET /index.php?limitstart=25&lang=en HTTP/1.1
2,GET /index.php/ta/component/content/article/43 HTTP/1.1
3,GET /index.php/ta/component/content/article/39-test HTTP/1.1
4,GET /robots.txt HTTP/1.1
5,GET /robots.txt HTTP/1.1
6,GET /index.php/en/computer-security-feeds/15-computer-security/2-us-cert-cyber-security-alerts HTTP/1.1
7,GET /index.php/component/content/article/10-tips/59-use-firefox-more-safe HTTP/1.1
8,GET /robots.txt HTTP/1.1
9,GET /onlinerenew/ HTTP/1.1
我可能為此使用正則表達式。
import pandas as pd
import re
def categoriser(x):
if re.search('(.png|.gif|.jpeg|.jpg)', x):
return 'A'
elif re.search('(.pdf|.ppt|.doc)', x):
return 'B'
elif 'favicon.ico'in x:
return 'C'
elif 'robots.txt'in x:
return 'D'
elif 'GET / HTTP/1.1' in x:
return 'E'
elif re.search('(.html|.htm|.asp|.jsp|.php|.cgi|.js|.css)', x):
return 'F'
elif re.search('(zip|rar|gzip|tar|gz|7z)', x):
return 'G'
else:
return 'H'
string = """0,GET /index.php?lang=ta HTTP/1.1
1,GET /index.php?limitstart=25&lang=en HTTP/1.1
2,GET /index.php/ta/component/content/article/43 HTTP/1.1
3,GET /index.php/ta/component/content/article/39-test HTTP/1.1
4,GET /robots.txt HTTP/1.1
5,GET /robots.txt HTTP/1.1
6,GET /index.php/en/computer-security-feeds/15-computer-security/2-us-cert-cyber-security-alerts HTTP/1.1
7,GET /index.php/component/content/article/10-tips/59-use-firefox-more-safe HTTP/1.1
8,GET /robots.txt HTTP/1.1
9,GET /onlinerenew/ HTTP/1.1"""
frame = pd.DataFrame([x.split(",") for x in string.split("\n")])
print frame.loc[:,1].apply(categoriser)
結果是:
0 F
1 F
2 F
3 F
4 D
5 D
6 F
7 F
8 D
9 H
Name: 1, dtype: object
那是你想要的嗎? 如果您下次可以包含所需的輸出,那就太好了:)帶有dtype:object的東西是,位於數據幀下面的numpy數組調用字符串和一堆其他東西對象……在這種情況下,它仍然是字符串: )
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.