[英]Space separated csv with spaces in column names and values
I have to load a csv file to dataframe but the columns are separated with single spaces and also contain spaces in columns/values names.我必须将 csv 文件加载到 dataframe 但列用单个空格分隔,并且列/值名称中也包含空格。 File looks like that:
文件看起来像这样:
'Mod Ports Card Type Model Serial No.',
' 3 20 7600 ES+ 7600-ES+20G3C SAL1550Y9DL',
' 5 2 Route Switch Processor 720 (Active) RSP720-3C-GE SAL16095Q9W',
etc.
My best idea so far was to check for length of the word in the column name and then check if the corresponding values lower has bigger of lower number of characters but in some cases like 'Card Type' and '7600 ES+' could be potentially recognized as 2 separate columns.到目前为止,我最好的想法是检查列名中单词的长度,然后检查相应的值是否较低的字符数较大,但在某些情况下,例如“卡类型”和“7600 ES+”可能会被识别作为2个单独的列。
What's important is that this solution has to be universal and work not only for this example but for different ones too.重要的是这个解决方案必须是通用的,并且不仅适用于这个例子,也适用于不同的例子。 My goal is to read this file to dataframe or any other data structure.
我的目标是将此文件读入 dataframe 或任何其他数据结构。
I tried to use the pd.read_fwf()
function but it gives incorrect results.我尝试使用
pd.read_fwf()
function 但它给出了不正确的结果。 The output dataframe for my file looks like that:我的文件的 output dataframe 如下所示:
So not only it didn't catch the Card type
correctly but it merged it with ports and created some Unnamed columns.因此,它不仅没有正确捕获
Card type
,而且将其与端口合并并创建了一些未命名的列。
You can use read_fwf() :您可以使用read_fwf() :
df = pd.read_fwf('my_file.csv')
It will work the best if you provide it with widths
parameter for each column.如果您为每列提供
widths
参数,它将工作得最好。
EDIT编辑
Using the data you provided you can get results with colspecs
parameter:使用您提供的数据,您可以使用
colspecs
参数获得结果:
df = pd.read_fwf(a, colspecs=[(0, 4), (4, 10), (10, 49), (49, 68), (68, 1000)])
df
Mod Ports Card Type Model Serial No.
0 3 20 7600 ES+ 7600-ES+20G3C SAL1550Y9DL
1 5 2 Route Switch Processor 720 (Active) RSP720-3C-GE SAL16095Q9W
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.