如何从 txt 文件创建 csv 文件，在“x”个字符后使用列分隔符

Question

I have a txt file that looks like this:我有一个看起来像这样的 txt 文件：

MT0111500000000 Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area
MT0112220000000 Auburn-Opelika, AL Metropolitan Statistical Area  
MT0113820000000 Birmingham-Hoover, AL Metropolitan Statistical Area

I need to create a csv file from this, which I have little experience in, but have been learning and doing it, although likely not efficiently.我需要从中创建一个 csv 文件，我对此几乎没有经验，但一直在学习和做，虽然可能效率不高。

My issue right now is that when I use pandas, it is creating columns after the ",".我现在的问题是，当我使用 pandas 时，它会在“，”之后创建列。 What I need is the column separator to be after the code on the left, "MT0113820000000", and although the codes do change, they are all the same length.我需要的是列分隔符位于左侧代码“MT0113820000000”之后，尽管代码确实发生了变化，但它们的长度都相同。

Thanks in advance, I know this is a really noobie question.在此先感谢，我知道这是一个非常noobie的问题。

Here's my code currently:这是我目前的代码：

import pandas as pd

dataframe1 = pd.read_csv("C:/Users/andre/Desktop/bea_api_test/python-bureau-economic-analysis-api-client/testttt/output.txt")  
dataframe1.to_csv('output_.csv', index = None)

And the output:和 output：

COLUMN 1                                COLUMN 2
MT0111500000000 Anniston-Oxford-Jacksonville     | AL Metropolitan Statistical Area

Answer 1

Alternatively, using read_fwf as mentioned in a comment above:或者，使用上面评论中提到的read_fwf ：

from io import StringIO
import pandas as pd

testdata = '''\
MT0111500000000 Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area
MT0112220000000 Auburn-Opelika, AL Metropolitan Statistical Area
MT0113820000000 Birmingham-Hoover, AL Metropolitan Statistical Area
'''

buff = StringIO(testdata)

df = pd.read_fwf(buff, header=None, colspecs=[(0, 15), (16, 64 * 1024)])

print(df.to_csv(index=False, columns=[0, 1], header=['COLUMN1', 'COLUMN2']))

Answer 2

That's not a CSV and I don't see a convenient way of convincing read_csv to do the right thing.这不是 CSV 并且我看不到说服read_csv做正确事情的便捷方法。 Luckily, there seems to be an easy rule here.幸运的是，这里似乎有一个简单的规则。 The stuff before the first space, then the stuff after.第一个空格之前的东西，然后是之后的东西。 str.split does that. str.split这样做的。

import pandas as pd
from pathlib import Path

#in_file = Path("C:/Users/andre/Desktop/bea_api_test/python-bureau-economic-analysis-api-client/testttt/output.txt")
in_file = Path("test.txt")
out_file = in_file.with_name(in_file.stem + "_").with_suffix(".csv")

    # test data
    open(in_file, "w").write("""\
    MT0111500000000 Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area
    MT0112220000000 Auburn-Opelika, AL Metropolitan Statistical Area  
    MT0113820000000 Birmingham-Hoover, AL Metropolitan Statistical Area""")
    
    # convert to csv
    pd.DataFrame([line.strip().split(" ",1) for line in open(in_file)],
        columns=["COLUMN1", "COLUMN2"]).to_csv(out_file, index=None, headr=False)
    
    # visual verification
    print(open(out_file).read())

Output Output

MT0111500000000,"Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area"
MT0112220000000,"Auburn-Opelika, AL Metropolitan Statistical Area"
MT0113820000000,"Birmingham-Hoover, AL Metropolitan Statistical Area"

In this example I immediately wrote the csv so that the dataframe is automatically deleted from memory.在此示例中，我立即编写了 csv，以便自动从 memory 中删除 dataframe。 You could also do this with the CSV module, writing line at a time.您也可以使用 CSV 模块执行此操作，一次写入一行。 This will use less memory because it don't have to hold the entire file in memory.这将使用更少的 memory，因为它不必将整个文件保存在 memory 中。 And since csv is part of the standard python library, there is no external dependency on pandas .由于csv是标准 python 库的一部分，因此对pandas没有外部依赖。 Adding a bit of file name handling添加一些文件名处理

import csv
from pathlib import Path

#in_file = Path("C:/Users/andre/Desktop/bea_api_test/python-bureau-economic-analysis-api-client/testttt/output.txt")
in_file = Path("test.txt")
out_file = in_file.with_name(in_file.stem + "_").with_suffix(".csv")

# test data
open(in_file, "w").write("""\
MT0111500000000 Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area
MT0112220000000 Auburn-Opelika, AL Metropolitan Statistical Area  
MT0113820000000 Birmingham-Hoover, AL Metropolitan Statistical Area""")

# convert to csv
with open(in_file) as infp, open(out_file, "w") as outfp:
    writer = csv.writer(outfp)
    writer.writerows(line.strip().split(" ",1) for line in infp)

# visual verification
print(open(out_file).read())

Answer 3

You can split the data at the first occurrence of the whitespace:您可以在第一次出现空格时拆分数据：

data = pd.read_table("data.txt", squeeze = True, header = None).str.split(" ", 1)
df = pd.DataFrame(data.tolist(), columns = ["column1", "column2"])

df.to_csv("df.csv")

如何从 txt 文件创建 csv 文件，在“x”个字符后使用列分隔符

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-01-11 20:14:56

解决方案2
0 2021-01-11 20:03:58

解决方案3
0 2021-01-11 20:53:46

如何从 txt 文件创建 csv 文件，在“x”个字符后使用列分隔符

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-01-11 20:14:56

解决方案2 0 2021-01-11 20:03:58

解决方案3 0 2021-01-11 20:53:46

解决方案1
1 已采纳 2021-01-11 20:14:56

解决方案2
0 2021-01-11 20:03:58

解决方案3
0 2021-01-11 20:53:46