[英]Script to parse, delete & mask IP addresses
我有一個包含3列的CSV文件:
第1列- 總體值 -是ID_IP地址的串聯[51515151 99.999.999.999]
第2 欄 - 時間欄 - 時間 [2019-02-25T19:04:59.999-0500]
我試圖從第一列中解析ID,方法是將其分為具有ID和IP地址的兩列,然后丟棄具有新創建的IP地址的列,因為它們已包含在第3列中。
這是我到目前為止的代碼:
import pandas as pd
from pandas import read_csv
df1= pd.read_csv('C:\\Users\\[redacted]\\Documents\\Python\\Parsing.csv')
df1.dropna(inplace = True) # dropping null value columns to avoid errors
df1 = df1["Overall Value"].str.split(" ", n = 1, expand = True) # updating data frame with split value columns
df1["ID"]= df1[0] # making seperate ID column from new data frame
df1["IP2"]= df1[1] # making seperate IP column from new data frame
df1["Time"]= df1[2]
df1["IP"]= df1[3]
df1.drop(columns =["IP2"], inplace = True) # deleting column 2
df2 = pd.read_csv('C:\\Users\\[redacted]\\Documents\\Python\\Parsingcopy.csv', index_col=0)
df1 = df1.map(df2)
df1.to_csv('C:\\Users\\[redacted]\\Documents\\Python\\Parsingcopy2.csv')
為什么會出現以下錯誤?
C:\Users\[Redacted]>C:\Python27\python.exe C:\Users\[Redacted]\Documents\Python\Parsing.py
Traceback (most recent call last):
File "C:\Users\[Redacted]\Documents\Python\Parsing.py", line 21, in <module>
df1["RestofData"]= df1[2]
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2139, in __getitem__
return self._getitem_column(key)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2146, in _getitem_column
return self._get_item_cache(key)
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 1842, in _get_item_cache
values = self._data.get(item)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3843, in get
loc = self.items.get_loc(item)
File "C:\Python27\lib\site-packages\pandas\core\indexes\base.py", line 2527, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 2
通過做這個:
df1 = df1["Overall Value"].str.split(...)
您不是在更新現有的數據框,而是創建一個新的數據框並將df1
名稱指向它。
df1
現在不再引用原始數據幀,因此df[2]
(和df[3]
)不存在,這就是KeyError: 2
告訴您的。
相反,您應該為臨時數據框使用其他名稱,然后使用該名稱來更新原始名稱。
此外,與其先創建兩個新列,而不是立即丟棄其中一列,不如僅使用實際需要的列。
對於已經存在的其余列,應使用索引1和2而不是2和3,但是由於它們已包含在df1
因此不必“重新插入”它們。
像這樣:
ids_ips = df1["Overall Value"].str.split(" ", n = 1, expand = True)
df1["ID"] = ids_ips[0]
# df1["IP2"] = ids_ips[1] <-- don't do this
df1["Time"] = df1[1] # this is probably not necessary, too
df1["IP"] = df1[2] # neither is this
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.