繁体   English   中英

numpy 将数据转换为 numpy 数组

[英]numpy converting data into numpy array

我有一个从数据库中提取的数据,数据由“|”分隔我试图将其加载到 numpy 数组以执行一些过滤。 例如,仅将包含在第 3 列 LOGOUT 中的行保存到文件中。 我从加载 example.txt 文件开始使用:

import numpy as np


data = np.genfromtxt('example.txt',
                 skip_header=1,
                 skip_footer=1,
                 names=True,
                 dtype=None,
                 delimiter='|',
                 encoding='utf-8',
                 filling_values=None)

但我得到了错误:

ValueError: Some errors were detected !
Line #3 (got 14 columns instead of 13)
Line #4 (got 14 columns instead of 13)
Line #5 (got 14 columns instead of 13)

txt文件中的数据为:

|ID|TIMESTAMP|EVENT_DATE|GROUP|EVENT|CHANNEL|WERT|WERTY|WERTY|SESSION_ID|IP|WERT|DATA|
|5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qwewqeq||weqeqewqewe
|5818222|2021-03-15T18:18:20+01:00|2021-03-15|LOGOUT|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD||qweqe|weqeqewqewe
|5818222|2021-03-15T18:18:20+01:00|2021-03-15|LOGOUT|SESSION||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qweqe||weqeqewqewe
|5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGOUT|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|||weqeqewqewe
|5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|||weqeqewqewe
|5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qweqwe|wqewqe|weqeqewqewe

每行包含不超过 13 个元素.. 我做错了什么?

如果您的数据在example.txt中,您可以执行以下操作:

with open('example.txt') as fp:
    lines = fp.read().splitlines()
data = [x.split('|')[1:] for x in lines][1:]

其中索引用于丢弃 header 和空列。 您将获得一个包含文件中数据的二维数组。 如果您需要它作为 Numpy 数组,请执行np.array(data)

首先,问题仅在第 3,4,5 行显示的原因是由于skip_header , skip_footer

没有skip_footer

import numpy as np


data = np.genfromtxt('example.txt',
                 skip_header=1,
                 names=True,
                 dtype=None,
                 delimiter='|',
                 encoding='utf-8',
                 filling_values=None)

错误:

    Line #3 (got 14 columns instead of 13)
    Line #4 (got 14 columns instead of 13)
    Line #5 (got 14 columns instead of 13)
    Line #6 (got 14 columns instead of 13)
    Line #7 (got 14 columns instead of 13)

所以首先, skip_header的值应该是 0。结果:

data = np.genfromtxt('example.txt',
                 names=True,
                 dtype=None,
                 delimiter='|',
                 encoding='utf-8',
                 filling_values=None)

结果:

array([(False, 5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGIN', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', 'qwewqeq', '', 'weqeqewqewe'),
       (False, 5818222, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGOUT', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', '', 'qweqe', 'weqeqewqewe'),
       (False, 5818222, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGOUT', 'SESSION', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', 'qweqe', '', 'weqeqewqewe'),
       (False, 5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGOUT', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', '', '', 'weqeqewqewe'),
       (False, 5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGIN', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', '', '', 'weqeqewqewe'),
       (False, 5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGIN', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', 'qweqwe', 'wqewqe', 'weqeqewqewe')],
      dtype=[('ID', '?'), ('TIMESTAMP', '<i4'), ('EVENT_DATE', '<U25'), ('GROUP', '<U10'), ('EVENT', '<U6'), ('CHANNEL', '<U14'), ('WERT', '?'), ('WERTY', '<U15'), ('WERTY_1', '<U15'), ('SESSION_ID', '<U8'), ('IP', '<U22'), ('WERT_1', '<U7'), ('DATA', '<U6'), ('f0', '<U11')])

第一列值为Falsedtype错误的原因是因为 txt 文件的第一行包含的分隔符比其他行多

>>>line0= "|ID|TIMESTAMP|EVENT_DATE|GROUP|EVENT|CHANNEL|WERT|WERTY|WERTY|SESSION_ID|IP|WERT|DATA|"
>>>line1 = 
"|5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qwewqeq||weqeqewqewe"
>>>delimiter  = '|'
>>>line0.count(delimiter)
14
>>>line1.count(delimiter)
13

解决方案:对于 1 个分隔符,我们有 2 个信息,这里我们有 13 个信息,所以我们只需要 12 个分隔符,最后:txt 文件:

ID|TIMESTAMP|EVENT_DATE|GROUP|EVENT|CHANNEL|WERT|WERTY|WERTY|SESSION_ID|IP|WERT|DATA
5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qwewqeq||weqeqewqewe
5818222|2021-03-15T18:18:20+01:00|2021-03-15|LOGOUT|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD||qweqe|weqeqewqewe
5818222|2021-03-15T18:18:20+01:00|2021-03-15|LOGOUT|SESSION||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qweqe||weqeqewqewe
5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGOUT|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|||weqeqewqewe
5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|||weqeqewqewe
5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qweqwe|wqewqe|weqeqewqewe

代码:

data = np.genfromtxt('d2.txt',names=True,dtype=None,delimiter='|',encoding='utf-8',filling_values=None,skip_header=0)

结果:

array([(5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGIN', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', 'qwewqeq', '', 'weqeqewqewe'),
       (5818222, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGOUT', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', '', 'qweqe', 'weqeqewqewe'),
       (5818222, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGOUT', 'SESSION', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', 'qweqe', '', 'weqeqewqewe'),
       (5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGOUT', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', '', '', 'weqeqewqewe'),
       (5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGIN', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', '', '', 'weqeqewqewe'),
       (5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGIN', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', 'qweqwe', 'wqewqe', 'weqeqewqewe')],
      dtype=[('ID', '<i4'), ('TIMESTAMP', '<U25'), ('EVENT_DATE', '<U10'), ('GROUP', '<U6'), ('EVENT', '<U14'), ('CHANNEL', '?'), ('WERT', '<U15'), ('WERTY', '<U15'), ('WERTY_1', '<U8'), ('SESSION_ID', '<U22'), ('IP', '<U7'), ('WERT_1', '<U6'), ('DATA', '<U11')])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM