简体   繁体   English

用分隔符/sep 分隔 csv 列

[英]Separating csv columns with delimiter / sep

My goal is to separate data stored in cells to multiple columns in the same row.我的目标是将存储在单元格中的数据分成同一行中的多个列。

For example, I would like to take data that looks like this:例如,我想获取如下所示的数据:

Row 1: [<1><2>][<3><4>][][]

Row 2: [<1><2>][<3><4>][][]

Into data that looks like this:进入如下所示的数据:

Row 1: [1][2][3][4]

Row 2: [1][2][3][4]

I tried using the code below to pull the csv and separate each line at the ">"我尝试使用下面的代码来拉 csv 并在“>”处分隔每一行

df = pd.read_csv('file.csv', engine='python', sep="\*>", header=None)

However, the code did not function as anticipated.但是,代码没有像预期的那样 function。 Instead, the separation occurred at seemingly random and unpredictable points (I'm sure there's a pattern but I don't see it.) And each break created another row as opposed to another column.相反,分离发生在看似随机且不可预测的点(我确信有一个模式,但我没有看到它。)每个中断都创建了另一行而不是另一列。 For example:例如:

Row 1: [<1>][<2>]

Row 2: [<3>]

Row 3: [<4>]

I thought the issue might lie with reading the CSV file so I tried just re-scraping the site with the separator included but it produced the same results so I'm assuming its an issue with the separator call.我认为问题可能在于读取 CSV 文件,所以我尝试重新抓取包含分隔符的站点,但它产生了相同的结果,所以我假设它是分隔符调用的问题。 However, I found that call after trying many others that caused various errors.但是,在尝试了许多其他导致各种错误的调用后,我发现了该调用。 For example, when I tried using sep = '>' I got the following error: ParserError: '>' expected after '"' and when I tried sep = '\>' , I got the following error: ParserError: Expected 36 fields in line 1106, saw 120. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.例如,当我尝试使用sep = '>'时,我收到以下错误: ParserError: '>' expected after '"'并且当我尝试sep = '\>'时,我收到以下错误: ParserError: Expected 36 fields in line 1106, saw 120. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

These errors sent me looking though multiple resources including this and this among others.这些错误让我查看了多种资源,包括这个这个等等。

However, I have find no resources that have successfully demonstrated how I can separate each column within a row following the use of a '>' delimiter.但是,我没有找到成功演示如何在使用“>”分隔符后分隔一行中的每一列的资源。 If anyone knows how to do this, please let me know.如果有人知道如何做到这一点,请告诉我。 Your help is much appreciated!非常感谢您的帮助!

Update:更新:

Here is an actual screenshot of the CSV file for a better understanding of what I was trying to demonstrate above.这是 CSV 文件的实际屏幕截图,以便更好地理解我在上面试图演示的内容。 My end goal is to have all the data is columns I+ have data on one descriptive factor as opposed to many as they do now.我的最终目标是让所有数据都是 I+ 列中的一个描述性因素的数据,而不是像现在这样的许多数据。

在此处输入图像描述

Would this work:这会起作用吗:

string="[<1><2>][<3><4>][][]"
string=string.replace("[","")
string=string.replace("]","")
string=string.replace("<","[")
string=string.replace(">","]")
print(string)

Result:结果:

[1][2][3][4]

I ended up using Google Sheets.我最终使用了谷歌表格。 Once you upload the csv there is a header titled "data" and then a sub-section titled "split text to columns."上传 csv 后,会出现一个名为“data”的 header,然后是一个名为“split text to columns”的小节。

If you want a faster way to do this with code, you can also do the following with pandas:如果您想以更快的方式使用代码执行此操作,您还可以使用 pandas 执行以下操作:

# new data frame with split value columns 
new = data["Name"].str.split(" ", n = 1, expand = True) 

# making separate first name column from new data frame 
data["First Name"]= new[0] 

# making separate last name column from new data frame 
data["Last Name"]= new[1] 

# Dropping old Name columns 
data.drop(columns =["Name"], inplace = True) 

# df display 
data 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM