如何從 txt 文件創建 csv 文件，在“x”個字符后使用列分隔符

Question

我有一個看起來像這樣的 txt 文件：

MT0111500000000 Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area
MT0112220000000 Auburn-Opelika, AL Metropolitan Statistical Area  
MT0113820000000 Birmingham-Hoover, AL Metropolitan Statistical Area

我需要從中創建一個 csv 文件，我對此幾乎沒有經驗，但一直在學習和做，雖然可能效率不高。

我現在的問題是，當我使用 pandas 時，它會在“，”之后創建列。 我需要的是列分隔符位於左側代碼“MT0113820000000”之后，盡管代碼確實發生了變化，但它們的長度都相同。

在此先感謝，我知道這是一個非常noobie的問題。

這是我目前的代碼：

import pandas as pd

dataframe1 = pd.read_csv("C:/Users/andre/Desktop/bea_api_test/python-bureau-economic-analysis-api-client/testttt/output.txt")  
dataframe1.to_csv('output_.csv', index = None)

和 output：

COLUMN 1                                COLUMN 2
MT0111500000000 Anniston-Oxford-Jacksonville     | AL Metropolitan Statistical Area

Answer 1

或者，使用上面評論中提到的read_fwf ：

from io import StringIO
import pandas as pd

testdata = '''\
MT0111500000000 Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area
MT0112220000000 Auburn-Opelika, AL Metropolitan Statistical Area
MT0113820000000 Birmingham-Hoover, AL Metropolitan Statistical Area
'''

buff = StringIO(testdata)

df = pd.read_fwf(buff, header=None, colspecs=[(0, 15), (16, 64 * 1024)])

print(df.to_csv(index=False, columns=[0, 1], header=['COLUMN1', 'COLUMN2']))

Answer 2

這不是 CSV 並且我看不到說服read_csv做正確事情的便捷方法。 幸運的是，這里似乎有一個簡單的規則。 第一個空格之前的東西，然后是之后的東西。 str.split這樣做的。

import pandas as pd
from pathlib import Path

#in_file = Path("C:/Users/andre/Desktop/bea_api_test/python-bureau-economic-analysis-api-client/testttt/output.txt")
in_file = Path("test.txt")
out_file = in_file.with_name(in_file.stem + "_").with_suffix(".csv")

    # test data
    open(in_file, "w").write("""\
    MT0111500000000 Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area
    MT0112220000000 Auburn-Opelika, AL Metropolitan Statistical Area  
    MT0113820000000 Birmingham-Hoover, AL Metropolitan Statistical Area""")
    
    # convert to csv
    pd.DataFrame([line.strip().split(" ",1) for line in open(in_file)],
        columns=["COLUMN1", "COLUMN2"]).to_csv(out_file, index=None, headr=False)
    
    # visual verification
    print(open(out_file).read())

Output

MT0111500000000,"Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area"
MT0112220000000,"Auburn-Opelika, AL Metropolitan Statistical Area"
MT0113820000000,"Birmingham-Hoover, AL Metropolitan Statistical Area"

在此示例中，我立即編寫了 csv，以便自動從 memory 中刪除 dataframe。 您也可以使用 CSV 模塊執行此操作，一次寫入一行。 這將使用更少的 memory，因為它不必將整個文件保存在 memory 中。 由於csv是標准 python 庫的一部分，因此對pandas沒有外部依賴。 添加一些文件名處理

import csv
from pathlib import Path

#in_file = Path("C:/Users/andre/Desktop/bea_api_test/python-bureau-economic-analysis-api-client/testttt/output.txt")
in_file = Path("test.txt")
out_file = in_file.with_name(in_file.stem + "_").with_suffix(".csv")

# test data
open(in_file, "w").write("""\
MT0111500000000 Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area
MT0112220000000 Auburn-Opelika, AL Metropolitan Statistical Area  
MT0113820000000 Birmingham-Hoover, AL Metropolitan Statistical Area""")

# convert to csv
with open(in_file) as infp, open(out_file, "w") as outfp:
    writer = csv.writer(outfp)
    writer.writerows(line.strip().split(" ",1) for line in infp)

# visual verification
print(open(out_file).read())

Answer 3

您可以在第一次出現空格時拆分數據：

data = pd.read_table("data.txt", squeeze = True, header = None).str.split(" ", 1)
df = pd.DataFrame(data.tolist(), columns = ["column1", "column2"])

df.to_csv("df.csv")

如何從 txt 文件創建 csv 文件，在“x”個字符后使用列分隔符

問題描述

3 個解決方案

解決方案1
1 已采納 2021-01-11 20:14:56

解決方案2
0 2021-01-11 20:03:58

解決方案3
0 2021-01-11 20:53:46

如何從 txt 文件創建 csv 文件，在“x”個字符后使用列分隔符

問題描述

3 個解決方案

解決方案1 1 已采納 2021-01-11 20:14:56

解決方案2 0 2021-01-11 20:03:58

解決方案3 0 2021-01-11 20:53:46

解決方案1
1 已采納 2021-01-11 20:14:56

解決方案2
0 2021-01-11 20:03:58

解決方案3
0 2021-01-11 20:53:46