[英]How to import into pandas a file that is using a comma as delimiter but one of its columns has commas?
I have a text file that is separated by commas, but several columns have commas inside them so it creates columns where they are not needed.我有一个用逗号分隔的文本文件,但有几列里面有逗号,所以它会在不需要它们的地方创建列。 I have tried eliminating all the commas, then using regex to find only the numbers and add a comma (not worked) using the following solution ( Put comma after a pattern in python regex ).
我尝试消除所有逗号,然后使用正则表达式仅查找数字并使用以下解决方案添加逗号(不起作用)( 在 python regex 中的模式后放置逗号)。
Excel has the same problem, and other text editors as well. Excel 有同样的问题,其他文本编辑器也有。
0111,Cultivo de cereales y otros cultivos n.c.p.,011,Cultivos en general; cultivo de productos de mercado; hortic,01,AGRICULTURA, GANADERIA, CAZA Y ACTIVIDADES DE SERVICIOS CONE,01,**AGRICULTURA, GANADERIA, CAZA Y SILVICULTURA**
If you can see in the **
text, Python will not create one column but 3.如果可以在
**
文本中看到,Python 不会创建一列而是创建 3 列。
Another solution would be to place " " marks, but I have not found a solution that creates.另一种解决方案是放置“”标记,但我还没有找到创建的解决方案。
Your data source is buggy.您的数据源有问题。 It should put quotes
" "
around such values, then pandas would be able to parse it.它应该在这些值周围加上引号
" "
,然后 pandas 就可以解析它。 Without that, there is now no reliable logical way to tell the data apart now because the meaning of a comma now became ambiguous.没有它,现在就没有可靠的逻辑方法来区分数据,因为逗号的含义现在变得模棱两可。
A heuristic solution could be to assume that any comma followed by a space should be removed while the others should be retained, you could try that, but there can still be cases in which it may fail.一个启发式的解决方案可能是假设应该删除任何后跟空格的逗号,而应该保留其他逗号,您可以尝试这样做,但仍然存在可能失败的情况。
data.replace(", ", " ")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.