[英]Pandas: read_csv indicating 'space-delimited'
我有以下file.txt(摘要):
SICcode Catcode Category SICname MultSIC
0111 A1500 Wheat, corn, soybeans and cash grain Wheat X
0112 A1600 Other commodities (incl rice, peanuts) Rice X
0115 A1500 Wheat, corn, soybeans and cash grain Corn X
0116 A1500 Wheat, corn, soybeans and cash grain Soybeans X
0119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC X
0131 A1100 Cotton Cotton X
0132 A1300 Tobacco & Tobacco products Tobacco X
將其讀入pandas df時遇到一些問題。 我嘗試使用以下規格的pd.read_csv
engine='python', sep='Tab'
但它在一列中返回了文件:
SICcode Catcode Category SICname MultSIC
0 0111 A1500 Wheat, corn, soybeans...
1 0112 A1600 Other commodities (in...
2 0115 A1500 Wheat, corn, soybeans...
3 0116 A1500 Wheat, corn, soybeans...
然后,我嘗試使用“ tab”作為分隔符將其放入一個數字文件中,但它將文件讀為一列。 有人對此有想法嗎?
如果df = pd.read_csv('file.txt', sep='\\t')
返回帶有一列的DataFrame,則顯然file.txt
沒有使用制表符作為分隔符。 您的數據可能只包含空格作為分隔符。 在這種情況下,您可以嘗試
df = pd.read_csv('data', sep=r'\s{2,}')
它使用正則表達式模式\\s{2,}
作為分隔符。 此正則表達式匹配2個或多個空格字符。
In [8]: df
Out[8]:
SICcode Catcode Category SICname \
0 111 A1500 Wheat, corn, soybeans and cash grain Wheat
1 112 A1600 Other commodities (incl rice, peanuts) Rice
2 115 A1500 Wheat, corn, soybeans and cash grain Corn
3 116 A1500 Wheat, corn, soybeans and cash grain Soybeans
4 119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC
5 131 A1100 Cotton Cotton
6 132 A1300 Tobacco & Tobacco products Tobacco
MultSIC
0 X
1 X
2 X
3 X
4 X
5 X
6 X
如果這不起作用,請發布print(repr(open(file.txt, 'rb').read(100))
。這將向我們顯示file.txt
的前100個字節的明確表示。
我認為如果csv
中的數據由Tabulator
分隔,則可以嘗試將sep="\\t"
添加到read_csv
中。
import pandas as pd
df = pd.read_csv('test/a.csv', sep="\t")
print df
SICcode Catcode Category SICname \
0 111 A1500 Wheat, corn, soybeans and cash grain Wheat
1 112 A1600 ther commodities (incl rice, peanuts) Rice
2 115 A1500 Wheat, corn, soybeans and cash grain Corn
3 116 A1500 Wheat, corn, soybeans and cash grain Soybeans
4 119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC
5 131 A1100 Cotton Cotton
6 132 A1300 Tobacco & Tobacco products Tobacco
MultSIC
0 X
1 X
2 X
3 X
4 X
5 X
6 X
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.