[英]Split timestamp column into two new columns in CSV using python and pandas
[英]Character replacement and split for new columns in CSV dataframe using Python, sklearn, Pandas
目前,我正在嘗試將第 6 列從使用反斜杠(例如 2/4/09)的日期格式轉換為破折號而不是 0(2-4-9)。 此外,我想獲取每個值並給它自己的列(如所需輸出所示)。 我嘗試研究和實施一些解決方案,但我似乎無法弄清楚。 我仍在試圖弄清楚如何替換字符/刪除字符(如下所示)。 我對使用 Python 處理數據幀很陌生。 任何提示或幫助將不勝感激。 謝謝你。
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import ensemble
import pandas as pd
import numpy as np
df = pd.read_csv('file.csv')
df[6].replace(['\/'],['-'],regex=True, regex=True)
df[6].replace('0','',regex=True,inplace=True)
錯誤:
classifier_v1.4.py:18: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df.dropna(inplace=True, subset=['Name', 'TRY', 'LOC', 'OUTPUT', 'TYPE_A', 'SIGNAL', 'A-B', 'SPOT'])
Traceback (most recent call last):
File "/Users/namel/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 5
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "file.py", line 20, in <module>
df[5].replace(['\/'],['-'],regex=True)
File "/Users/name/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "/Users/name/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 5
當前 dataframe:
0 1 2 3 4 5 6 7
0 Name TRY LOC OUTPUT TYPE_A SIGNAL A-B SPOT
1 inc 1 2 20 TYPE-1 TORPEDO ULTRA 2/4/09 -21
2 inc 2 3 16 TYPE-2 TORPEDO ILH 2/4/09 -14
3 inc 3 2 20 BLACK47 TORPEDO LION 2/4/09 49
4 inc 4 3 12 TYPE-2 CENTRALPA LION 2/4/09 25
5 inc 5 3 10 TYPE-2 THREE LION 2/4/09 -21
6 inc 6 2 20 TYPE-2 ATF LION 2/4/09 -48
7 inc 7 4 2 NIVEA-1 ATF LION 7/3/03 -23
8 inc 8 3 16 NIVEA-1 ATF LION 7/3/03 18
9 inc 9 3 18 BLENDER CENTRALPA LION 7/3/03 48
10 inc 10 4 20 DELCO ATF LION 7/3/03 -26
11 inc 11 3 20 VE248 ATF LION 7/3/03 44
12 inc 12 1 20 SILVER CENTRALPA LION 5/9/02 -35
13 inc 13 2 20 CALVIN3 SEVENX LION 5/9/02 -20
14 inc 14 3 14 DECK-BT CENTRALPA LION 5/9/02 -38
15 inc 15 4 4 10-LEVI BERWYEN OWL 5/9/02 -29
16 inc 16 4 14 TYPE-2 ATF NOV 5/9/02 -31
17 inc 17 4 10 NYNY TORPEDO NOV 5/9/02 21
18 inc 18 2 20 NIVEA-1 CENTRALPA NOV 1/7/06 45
19 inc 19 3 27 FMRA97 TORPEDO NOV 1/7/06 -26
20 inc 20 4 18 SILVER ATF NOV 1/7/06 -46
所需的 output:
0 1 2 3 4 5 6 7 8 9 7
0 Name TRY LOC OUTPUT TYPE_A SIGNAL A-B D1 D2 D3 SPOT
1 inc 1 2 20 TYPE-1 TORPEDO ULTRA 2-4-9 2 4 9 -21
2 inc 2 3 16 TYPE-2 TORPEDO ILH 2-4-9 2 4 9 -14
3 inc 3 2 20 BLACK47 TORPEDO LION 2-4-9 2 4 9 49
4 inc 4 3 12 TYPE-2 CENTRALPA LION 2-4-9 2 4 9 25
5 inc 5 3 10 TYPE-2 THREE LION 2-4-9 2 4 9 -21
6 inc 6 2 20 TYPE-2 ATF LION 2-4-9 2 4 9 -48
7 inc 7 4 2 NIVEA-1 ATF LION 7-3-3 7 3 3 -23
8 inc 8 3 16 NIVEA-1 ATF LION 7-3-3 7 3 3 18
9 inc 9 3 18 BLENDER CENTRALPA LION 7-3-3 7 3 3 48
10 inc 10 4 20 DELCO ATF LION 7-3-3 7 3 3 -26
11 inc 11 3 20 VE248 ATF LION 7-3-3 7 3 3 44
12 inc 12 1 20 SILVER CENTRALPA LION 5-9-2 5 9 2 -35
13 inc 13 2 20 CALVIN3 SEVENX LION 5-9-2 5 9 2 -20
14 inc 14 3 14 DECK-BT CENTRALPA LION 5-9-2 5 9 2 -38
15 inc 15 4 4 10-LEVI BERWYEN OWL 5-9-2 5 9 2 -29
16 inc 16 4 14 TYPE-2 ATF NOV 5-9-2 5 9 2 -31
17 inc 17 4 10 NYNY TORPEDO NOV 5-9-2 5 9 2 21
18 inc 18 2 20 NIVEA-1 CENTRALPA NOV 1-7-6 1 7 6 45
19 inc 19 3 27 FMRA97 TORPEDO NOV 1-7-6 1 7 6 -26
20 inc 20 4 18 SILVER ATF NOV 1-7-6 1 7 6 -46
可能有一種更有效的方法可以做到這一點,但下面的代碼將實現您想要的。
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import ensemble
import pandas as pd
import numpy as np
df = pd.read_csv('file.csv')
# insert columns
df.insert(7, 'D1', '')
df.insert(8, 'D2', '')
df.insert(9, 'D3', '')
# replace
df['A-B'] = df['A-B'].str.replace('/', '-')
df['A-B'] = df['A-B'].str.replace('0', '')
# update new columns values
df['D1'] = df.apply(lambda x: str(x['A-B']).split('-')[0], axis=1)
df['D2'] = df.apply(lambda x: str(x['A-B']).split('-')[1], axis=1)
df['D3'] = df.apply(lambda x: str(x['A-B']).split('-')[2], axis=1)
print(df)
鑒於您正在處理日期,您可以在讀取 csv 時將日期加載為DateTime
並進一步處理它們。 由於您希望實現的年份的不常見格式(沒有零填充),它確實需要額外的處理:
dateparser = lambda x: pd.datetime.strptime(x, '%d/%m/%y')
df = pd.read_csv('file.csv', parse_dates=['A-B'], date_parser=dateparser)
df['D1'] = df['A-B'].dt.day
df['D2'] = df['A-B'].dt.month
df['D3'] = df['A-B'].dt.year
df['D3'] = df.apply(lambda row: int(str(row['A-B'].year)[3:]), axis=1)
df['A-B'] = df['A-B'].apply(lambda x: str(x.strftime('%d-%m-%y')).replace("0", ""))
Output:
姓名 | 嘗試 | LOC | OUTPUT | TYPE_A | 信號 | AB | 點 | D1 | D2 | D3 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 公司1 | 2 | 20 | TYPE-1 | 魚雷 | 極端主義者 | 2-4-9 | -21 | 2 | 4 | 9 |
1 | 公司2 | 3 | 16 | TYPE-2 | 魚雷 | ILH | 2-4-9 | -14 | 2 | 4 | 9 |
2 | 公司3 | 2 | 20 | 黑色47 | 魚雷 | 獅子 | 2-4-9 | 49 | 2 | 4 | 9 |
3 | 公司4 | 3 | 12 | TYPE-2 | 中央帕 | 獅子 | 2-4-9 | 25 | 2 | 4 | 9 |
4 | 公司5 | 3 | 10 | TYPE-2 | 三 | 獅子 | 2-4-9 | -21 | 2 | 4 | 9 |
KeyError: 5
表示密鑰 5 不存在。 在這種情況下,它不是 integer 而是一個字符串,所以你需要使用引號。
另一種(可能更實用)的方法是刪除第一行並將第 1 行用作列標題。
.replace
使用原始值和新值的列表沒有問題。 有幾種替代方式,下面顯示其中兩種。
使用如下所示的split
,您可以同時添加三個新列。
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import ensemble
import pandas as pd
import numpy as np
df = pd.read_csv('/Users/ciit2/downloads/test.csv', header=1)
df['A-B'].replace({'/': '-'}, regex=True, inplace=True)
df['A-B'].replace('0', '', regex=True, inplace=True)
df[['D1', 'D2', 'D3']] = pd.DataFrame(df['A-B'].str.split('-').tolist())
df
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.