简体   繁体   English

编辑文本表格文件的有效方法,这样每个单元格都在同一位置开始

[英]Efficient way to edit text tabular file so each cell starts at the same position

I have a text file of table-like structure, each line contains 0 to 4 words split by arbitrary number of spaces. 我有一个表格结构的文本文件,每行包含0到4个单词,并以任意数量的空格分隔。

hello     world  this  is
     an   example  file
is   there a   good
way to    clean this
  your help is   
highly      appreciated

My goal is to edit this file in a format where elements start at the same position across lines, for example: 我的目标是以文件在行中相同位置开始的格式编辑此文件,例如:

hello    world        this     is
         an           example  file
is       there        a        good
way      to           clean    this
         your         help     is       
highly   appreciated

The number of spaces is arbitrary. 空格数是任意的。 I prefer the lines starting with a space skips the first element, but this is not strict. 我希望以空格开头的行跳过第一个元素,但这并不严格。

I believe there are a lot of ways to do this, my preference order is: 我相信有很多方法可以做到这一点,我的偏好顺序是:

  1. On vim with some neat trick 在vim上使用一些巧妙的技巧
  2. By bash command 通过bash命令
  3. On a text editor with such functionality 在具有这种功能的文本编辑器上
  4. By script language (perhaps python) 通过脚本语言(可能是python)

Since this is a part of data prep/validation process, I do not need a perfect method; 由于这是数据准备/验证过程的一部分,因此我不需要完美的方法。 I will conduct manual check after all. 毕竟,我将进行手动检查。 I am looking for a way that does, say, 80 to 90% of the work. 我正在寻找一种可以完成80%至90%的工作的方法。

Can someone suggest an efficient approach? 有人可以建议一种有效的方法吗?

If useful, example file is here . 如果有用,示例文件在这里

Here's a way to get column to respect leading whitespace: change a leading space to some other character 这是一种使column尊重前导空格的方法:将前导空格更改为其他字符

sed 's/^ /_ /' file | column -t | sed 's/^_ /  /'
hello   world        this     is
        an           example  file
is      there        a        good
way     to           clean    this
        your         help     is
highly  appreciated

Python's re module, .format() offer a good approach to 4. . Python的re模块.format()4.提供了一种很好的方法。

The column width is based on the length of the longest non-whitespace string in your file + column_pad value. 列宽基于文件中最长的非空白字符串的长度+ column_pad值。

You can play around with column_pad to vary the actual column width. 您可以使用column_pad来改变实际的列宽。

If you pass in rename_file=True , you'll get a new file named 'cleaned_<filename> filename`. 如果您传递rename_file=True ,则会得到一个名为'cleaned_<filename> filename`的新文件。 Otherwise, the script will replace the original file with the cleaned file. 否则,脚本将用清除的文件替换原始文件。

#!/usr/bin/env python
import re
import sys

def clean_columns(filename, rename_file=False, column_pad=4):
    if rename_file:
        cleaned_filename = 'cleaned_' + filename
    else:
        cleaned_filename = filename

    cleaned_text = ''

    with open(filename, 'r') as dirty_file:
        text = dirty_file.readlines()

    string_list = list(
        {string.strip()
                for line in text
                for string in line.strip().split(' ')})

    max_string_length = len(max(string_list, key=len))
    column_width = max_string_length + column_pad
    formatting_string = '{: <' + str(column_width) + '}'

    for line in text:
        line = re.sub(r'\s+',' ', line).split(' ')
        formatting = formatting_string * len(line)
        line = formatting.format(*line)
        cleaned_text += line + '\n'

    with open(cleaned_filename, 'w') as cleaned:
        cleaned.write(cleaned_text)


clean_columns('sample.txt', rename_file=True, column_pad=8)

Output: 输出:

hello              world              this               is
                   an                 example            file
is                 there              a                  good
way                to                 clean              this
                   your               help               is
highly             appreciated

You can use the https://github.com/junegunn/vim-easy-align plugin to align various delimiters 您可以使用https://github.com/junegunn/vim-easy-align插件来对齐各种分隔符

Just Select the lines, press: 只需选择行,然后按:

  • <CR> : map to <Plug>(EasyAlign) <CR> :映射到<Plug>(EasyAlign)
  • <CP> : live preview, optional <CP> :实时预览,可选
  • * : align all delimiters * :对齐所有定界符
  • <CD> : toggle until left align delimiters <CD> :切换直到左对齐定界符
  • <CX>\\s\\@<=\\S\\+ : select non-space after space as delimiter <CX>\\s\\@<=\\S\\+ :选择空格后的非空格作为定界符

or use the command: '<,'>EasyAlign */\\s\\@<=\\S\\+/dl 或使用以下命令: '<,'>EasyAlign */\\s\\@<=\\S\\+/dl

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python-查找文本文件中同一行中每个可能的单词对出现频率的最有效方法? - Python - Most efficient way to find how often each possible pair of words occurs in the same line in a text file? 在 Python 中编辑文本文件最后一行的有效方法 - Efficient way to edit the last line of a text file in Python 如何重新排列 dataframe 的行,以便每行以相同的字符串开头 - How to rearrange the rows of a dataframe so that each row starts with the same string 检查预期分号 position 长度分隔文本文件的有效方法。 组合许多“或”语句 - Efficient way to check for expected semicolon position length-delimited text file. Combining many "or" statements 有效的方法来替换pandas数据帧中每个单元格的值 - Efficient way to replace value of each cell in a pandas dataframe 编辑每行文本文件的最后一个元素 - Edit last element of each line of text file 通过以TO开头的每一行来编辑文本文件 - Edit a text file by starting each line with TO 解析文本文件不同行的有效方法 - Efficient way to parse different lines of a text file Python-写文件文本的有效方法吗? - Python - efficient way to write to a file text? Python-读取大量表格数据的有效方法 - Python - Efficient way to read large amounts of tabular data
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM