简体   繁体   English

如何拆分文本文件并在Python中对其进行修改?

[英]how to split a text file and modify it in Python?

I currently have a text file that reads like this: 我目前有一个文本文件,内容如下:

101, Liberia, Monrovia, 111000, 3200000, Africa, English, Liberia Dollar;
102, Uganda, Kampala, 236000, 34000000, Africa, English and Swahili, Ugandan Shilling;
103, Madagascar, Antananarivo, 587000, 21000000, Africa, Magalasy and Frances, Malagasy Ariary;

I'm currently printing the file using this code: 我目前正在使用以下代码打印文件:

with open ("base.txt",'r') as f:
   for line in f:
      words = line.split(';')
      for word in words:
         print (word)

What I would like to know is, how can I modify a line by using their id number (101 for example) and keep the format they have and add or remove lines based on their id number? 我想知道的是,如何使用ID号(例如101)修改行并保持其格式,并根据ID号添加或删除行?

My understanding your asking how to modify a word in a line and then insert the modified line back into the file. 我的理解是您询问如何修改一行中的单词,然后将修改后的行重新插入文件中。

Change a word in the file 更改文件中的单词

def change_value(new_value, line_number, column):
    with open("base.txt",'r+') as f: #r+ means we can read and write to the file
        lines = f.read().split('\n') #lines is now a list of all the lines in the file
        words = lines[line_number].split(',')
        words[column] = new_value
        lines[line_number] = ','.join(words).rstrip('\n') #inserts the line into lines where each word is seperated by a ','
        f.seek(0)
        f.write('\n'.join(lines)) #writes our new lines back into the file

In order to use this function to set line 3, word 2 to Not_Madasgascar call it like this: 为了使用此功能设置line 3, word 2 Not_Madasgascar line 3, word 2像这样称呼它:

change_word("Not_Madagascar", 2, 1)

You will always have to add 1 to the line/word number because the first line/word is 0 您将始终必须在行/单词号上添加1 ,因为第一行/单词是0

Add a new line to the file 在文件中添加新行

def add_line(words, line_number):
    with open("base.txt",'r+') as f:
        lines = f.readlines()
        lines.insert(line_number, ','.join(words) + '\n')
        f.seek(0)
        f.writelines(lines)

In order to use this function add a line at the end containing the words this line is at the end call it like this: 为了使用该功能,在包含该单词的末尾添加一行this line is at the end调用它是这样的:

add_line(['this','line','is','at','the','end'], 4) #4 is the line number

For more information on opening files see here . 有关打开文件的更多信息,请参见此处

For more information on reading from and modifying files see here . 有关读取和修改文件的更多信息,请参见此处

pandas is a strong tool for solving your requirements. pandas是解决您的需求的强大工具。 It provides the tools for easily working with CSV files. 它提供了轻松处理CSV文件的工具。 You can manage your data in DataFrames . 您可以在DataFrames管理数据。

import pandas as pd

# read the CSV file into DataFrame
df = pd.read_csv('file.csv', sep=',', header=None, index_col = 0)
print (df)

在此处输入图片说明

# eliminating the `;` character
df[7] = df[7].map(lambda x: str(x).rstrip(';'))
print (df)

在此处输入图片说明

# eliminating the #101 row of data
df.drop(101, axis=0, inplace=True)
print (df)

在此处输入图片说明

Reading this file into an OrderedDict would probably be helpful if you are trying to preserve the original file ordering as well as have the ability to references lines in the file for modification/addition/deletion. 如果您尝试保留原始文件的顺序并能够引用文件中的行以进行修改/添加/删除,则将该文件读入OrderedDict可能会有所帮助。 There are quite a few assumptions about the full format of the file in the following example, but it will work for your test case: 在下面的示例中,关于文件的完整格式有很多假设,但是对于您的测试用例将起作用:

from collections import OrderedDict

content = OrderedDict()

with open('base.txt', 'r') as f:
    for line in f:
        if line.strip():
            print line
            words = line.split(',')  # Assuming that you meant ',' vs ';' to split the line into words
            content[int(words[0])] = ','.join(words[1:])

print(content[101])  # Prints " Liberia, Monrovia, etc"...

content.pop(101, None)  # Remove line w/ 101 as the "id"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM