使用 Python、sklearn、Z251D1BBFE9A3B678CEBAZ5366

Question

目前，我正在尝试将第 6 列从使用反斜杠（例如 2/4/09）的日期格式转换为破折号而不是 0（2-4-9）。 此外，我想获取每个值并给它自己的列（如所需输出所示）。 我尝试研究和实施一些解决方案，但我似乎无法弄清楚。 我仍在试图弄清楚如何替换字符/删除字符（如下所示）。 我对使用 Python 处理数据帧很陌生。 任何提示或帮助将不胜感激。 谢谢你。

from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import ensemble
import pandas as pd
import numpy as np

df = pd.read_csv('file.csv')

df[6].replace(['\/'],['-'],regex=True, regex=True)
df[6].replace('0','',regex=True,inplace=True)

错误：

classifier_v1.4.py:18: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.dropna(inplace=True, subset=['Name', 'TRY', 'LOC', 'OUTPUT', 'TYPE_A', 'SIGNAL', 'A-B', 'SPOT'])
Traceback (most recent call last):
  File "/Users/namel/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 5

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "file.py", line 20, in <module>
    df[5].replace(['\/'],['-'],regex=True)
  File "/Users/name/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Users/name/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 5

当前 dataframe：

         0    1    2        3          4       5        6     7  
0     Name  TRY  LOC   OUTPUT     TYPE_A   SIGNAL     A-B  SPOT 
1    inc 1    2   20   TYPE-1    TORPEDO   ULTRA   2/4/09   -21
2    inc 2    3   16   TYPE-2    TORPEDO     ILH   2/4/09   -14
3    inc 3    2   20  BLACK47    TORPEDO    LION   2/4/09    49
4    inc 4    3   12   TYPE-2  CENTRALPA    LION   2/4/09    25
5    inc 5    3   10   TYPE-2      THREE    LION   2/4/09   -21
6    inc 6    2   20   TYPE-2        ATF    LION   2/4/09   -48
7    inc 7    4    2  NIVEA-1        ATF    LION   7/3/03   -23
8    inc 8    3   16  NIVEA-1        ATF    LION   7/3/03    18
9    inc 9    3   18  BLENDER  CENTRALPA    LION   7/3/03    48
10   inc 10   4   20    DELCO        ATF    LION   7/3/03   -26
11   inc 11   3   20    VE248        ATF    LION   7/3/03    44
12   inc 12   1   20   SILVER  CENTRALPA    LION   5/9/02   -35
13   inc 13   2   20  CALVIN3     SEVENX    LION   5/9/02   -20
14   inc 14   3   14  DECK-BT  CENTRALPA    LION   5/9/02   -38
15   inc 15   4    4  10-LEVI    BERWYEN     OWL   5/9/02   -29
16   inc 16   4   14   TYPE-2        ATF     NOV   5/9/02   -31
17   inc 17   4   10     NYNY    TORPEDO     NOV   5/9/02    21
18   inc 18   2   20  NIVEA-1  CENTRALPA     NOV   1/7/06    45
19   inc 19   3   27   FMRA97    TORPEDO     NOV   1/7/06   -26
20   inc 20   4   18   SILVER        ATF     NOV   1/7/06   -46

所需的 output：

         0    1    2        3          4       5       6   7   8   9     7   
0     Name  TRY  LOC   OUTPUT     TYPE_A   SIGNAL    A-B  D1  D2  D3  SPOT 
1    inc 1    2   20   TYPE-1    TORPEDO   ULTRA   2-4-9   2   4   9   -21
2    inc 2    3   16   TYPE-2    TORPEDO     ILH   2-4-9   2   4   9   -14
3    inc 3    2   20  BLACK47    TORPEDO    LION   2-4-9   2   4   9    49
4    inc 4    3   12   TYPE-2  CENTRALPA    LION   2-4-9   2   4   9    25
5    inc 5    3   10   TYPE-2      THREE    LION   2-4-9   2   4   9   -21
6    inc 6    2   20   TYPE-2        ATF    LION   2-4-9   2   4   9   -48
7    inc 7    4    2  NIVEA-1        ATF    LION   7-3-3   7   3   3   -23
8    inc 8    3   16  NIVEA-1        ATF    LION   7-3-3   7   3   3    18
9    inc 9    3   18  BLENDER  CENTRALPA    LION   7-3-3   7   3   3    48
10   inc 10   4   20    DELCO        ATF    LION   7-3-3   7   3   3   -26
11   inc 11   3   20    VE248        ATF    LION   7-3-3   7   3   3    44
12   inc 12   1   20   SILVER  CENTRALPA    LION   5-9-2   5   9   2   -35
13   inc 13   2   20  CALVIN3     SEVENX    LION   5-9-2   5   9   2   -20
14   inc 14   3   14  DECK-BT  CENTRALPA    LION   5-9-2   5   9   2   -38
15   inc 15   4    4  10-LEVI    BERWYEN     OWL   5-9-2   5   9   2   -29
16   inc 16   4   14   TYPE-2        ATF     NOV   5-9-2   5   9   2   -31
17   inc 17   4   10     NYNY    TORPEDO     NOV   5-9-2   5   9   2    21
18   inc 18   2   20  NIVEA-1  CENTRALPA     NOV   1-7-6   1   7   6    45
19   inc 19   3   27   FMRA97    TORPEDO     NOV   1-7-6   1   7   6   -26
20   inc 20   4   18   SILVER        ATF     NOV   1-7-6   1   7   6   -46

Answer 1

可能有一种更有效的方法可以做到这一点，但下面的代码将实现您想要的。

from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import ensemble
import pandas as pd
import numpy as np

df = pd.read_csv('file.csv')

# insert columns
df.insert(7, 'D1', '')
df.insert(8, 'D2', '')
df.insert(9, 'D3', '')

# replace
df['A-B'] = df['A-B'].str.replace('/', '-')
df['A-B'] = df['A-B'].str.replace('0', '')

# update new columns values
df['D1'] = df.apply(lambda x: str(x['A-B']).split('-')[0], axis=1)
df['D2'] = df.apply(lambda x: str(x['A-B']).split('-')[1], axis=1)
df['D3'] = df.apply(lambda x: str(x['A-B']).split('-')[2], axis=1)

print(df)

Answer 2

鉴于您正在处理日期，您可以在读取 csv 时将日期加载为DateTime并进一步处理它们。 由于您希望实现的年份的不常见格式（没有零填充），它确实需要额外的处理：

dateparser = lambda x: pd.datetime.strptime(x, '%d/%m/%y')
df = pd.read_csv('file.csv', parse_dates=['A-B'], date_parser=dateparser)
df['D1'] = df['A-B'].dt.day
df['D2'] = df['A-B'].dt.month
df['D3'] = df['A-B'].dt.year
df['D3'] = df.apply(lambda row: int(str(row['A-B'].year)[3:]), axis=1)
df['A-B'] = df['A-B'].apply(lambda x: str(x.strftime('%d-%m-%y')).replace("0", ""))

Output：

	姓名	尝试	LOC	OUTPUT	TYPE_A	信号	AB	点	D1	D2	D3
0	公司1	2	20	TYPE-1	鱼雷	极端主义者	2-4-9	-21	2	4	9
1	公司2	3	16	TYPE-2	鱼雷	ILH	2-4-9	-14	2	4	9
2	公司3	2	20	黑色47	鱼雷	狮子	2-4-9	49	2	4	9
3	公司4	3	12	TYPE-2	中央帕	狮子	2-4-9	25	2	4	9
4	公司5	3	10	TYPE-2	三	狮子	2-4-9	-21	2	4	9

Answer 3

KeyError: 5表示密钥 5 不存在。 在这种情况下，它不是 integer 而是一个字符串，所以你需要使用引号。

另一种（可能更实用）的方法是删除第一行并将第 1 行用作列标题。

.replace使用原始值和新值的列表没有问题。 有几种替代方式，下面显示其中两种。

使用如下所示的split ，您可以同时添加三个新列。

from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import ensemble
import pandas as pd
import numpy as np

df = pd.read_csv('/Users/ciit2/downloads/test.csv', header=1)
df['A-B'].replace({'/': '-'}, regex=True, inplace=True)
df['A-B'].replace('0', '', regex=True, inplace=True)
df[['D1', 'D2', 'D3']] = pd.DataFrame(df['A-B'].str.split('-').tolist())
df

使用 Python、sklearn、Z251D1BBFE9A3B678CEBAZ5366

问题描述

3 个解决方案

解决方案1
0 2021-03-01 19:18:23

解决方案2
0 2021-03-01 20:02:39

解决方案3
0 2021-03-01 20:46:32

使用 Python、sklearn、Z251D1BBFE9A3B678CEBAZ5366

问题描述

3 个解决方案

解决方案1 0 2021-03-01 19:18:23

解决方案2 0 2021-03-01 20:02:39

解决方案3 0 2021-03-01 20:46:32

解决方案1
0 2021-03-01 19:18:23

解决方案2
0 2021-03-01 20:02:39

解决方案3
0 2021-03-01 20:46:32