简体   繁体   English

读取 csv 的分隔符在文本字段中

[英]Delimiter of read csv is in text field

I received extracted data from a server, the problem is the extract has the delimiter ";"我从服务器收到提取的数据,问题是提取有分隔符“;” in the csv file.在 csv 文件中。

I read the folder with the following command:我使用以下命令读取文件夹:

files = glob.glob(r"path/*.csv")
dfs = [pd.read_csv(f, sep=";", engine='c') for f in files]
df2 = pd.concat(dfs,ignore_index=True)

and the output is: output 是:


columnA    columnB .... columnT columnU
2000        A      ....  I wish  NaN
1000        B     ....   that    NaN
this ends   NaN   ....    NaN    NaN
3000        A     .....    I      DUU
...

the text in row 3 belongs to the columnT in the second row.第 3 行中的文本属于第二行中的 columnT。 So far i am only possible to delete all weirds rows like row 4 but i am not able to keep that information.到目前为止,我只能删除所有奇怪的行,如第 4 行,但我无法保留该信息。

df2.dropna(subset=['columnB'], how='all', inplace=True)

How can i read the files correctly?如何正确读取文件? The Problem is, that in the text field columnT in the text it also use ";"问题是,在文本中的文本字段 columnT 中,它也使用“;” as normal character.作为正常角色。

I wasn't aware of a programmatic approach to solve this (see my comment), but out of interest, a quick search led me to Escaping quotes and delimiters in CSV files with Excel .我不知道解决这个问题的编程方法(见我的评论),但出于兴趣,快速搜索让我找到了 CSV 文件中的 Escaping 引号和分隔符,其中 Excel 文件 Perhaps you could try the same.也许你也可以试试。 Ie, either manually or programmatically, replace all single quotes for double quotes, and try your code again.即,手动或以编程方式,将所有单引号替换为双引号,然后再次尝试您的代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM