简体   繁体   English

如何删除喜欢python的行之间的更多行?

[英]How to delete more lines in between lines you like python?

I have a weird file format 我有一个奇怪的文件格式

###########################################################
# Name of file#
# stuff[hh:mm:ss:ms] stuff[num] stuff[num] stuff[] stuff[]#
###########################################################
00:00:00.000 -1000 -1000 0.000001 20
00:00:00.001 -1000 -1000 0.000001 20
00:00:00.002 -1000 -1000 0.000001 20
00:00:00.003 -1000 -1000 0.000001 20
00:00:00.004 -1000 -1000 0.000001 20
00:00:00.005 -1000 -1000 0.000001 20
00:00:00.006 -1000 -1000 0.000001 20
00:00:00.007 -1000 -1000 0.000001 20

the problem is I need only info every 2 sec. 问题是我每2秒只需要信息。 Which means i need to edit out 1999 lines in between.(the space is actually /t) What is the best way of doing that. 这意味着我需要在它们之间编辑出1999行。(空格实际上是/ t)什么是最好的方法。 I would also like to have the numbers saved as numbers not strings. 我也想将数字另存为数字而不是字符串。

df = pd.read_csv('file.txt', sep="\t",
names=("time", "num1", "num2", "num3", "num4"), skiprows=4)
df["abs_time"] = df.index * 1e-3

I had to define time differently i already have the code for that i just need to save it properly. 我必须以不同的方式定义时间,我已经有了相应的代码,只需要正确保存即可。

def get_sec(time_str):
m, s, ss = time_str.split(':')
return int(m) * 60 + int(s) + 0.01*int(ss)

Any help well appreciated. 任何帮助,不胜感激。

As you need data for every 2 seconds, it will indicate you need to have second which is even and ending with "000"(you could choose odd seconds as well) assuming you have no missing data 由于您每2秒需要数据,这将表明您需要秒数为偶数,并且以“ 000”结尾(也可以选择奇数秒),前提是您没有丢失数据

def is_select(time_str):
    return str.endswith(time_str, ".000") and int(time_str[6:8])%2
df['even_seconds'] = pd.apply(lambda x: is_select(x["time"]), axis=1)
select_data = df[df.even_seconds==True]

x["time"][6:8] will give you seconds information (you could adjust the index yourself). x["time"][6:8]将为您提供秒信息(您可以自己调整索引)。

Of course, you could modify lambda function for other data selections. 当然,您可以为其他数据选择修改lambda函数。

You can use skiprows parameter to get odd rows (or even). 您可以使用skiprows参数获取奇数行(或偶数行)。 From the documentation: 从文档中:

If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. 如果可调用,则将针对行索引评估可调用函数,如果应跳过该行,则返回True,否则返回False。 An example of a valid callable argument would be lambda x: x in [0, 2]. 有效的可调用参数的示例为lambda x:[0,2]中的x。

Here you have an example csv: 这里有一个示例csv:

#
#
#
#
A,B
1,1
2,2
3,3
4,4

Then you can: 那么你也能:

pd.read_csv('test.csv', skiprows=lambda x: True if x < 4 or x%2 == 1 else False)

Output: 输出:

   A  B
0  2  2
1  4  4

As you can see, you can read odd or even lines and thus getting only rows every 2 seconds. 如您所见,您可以读取奇数行或偶数行,因此每2秒仅获得一行。 Notice though, this assumes: 但是请注意,这假定:

  1. You are using latest pandas version 0.20.2 您正在使用最新的熊猫0.20.2版本
  2. Your data is consecutive, ie one row per second 您的数据是连续的,即每秒一行

You cumsum the milisecond and check if they are modulo 2000, assuming you have strings in your first column. 假定毫秒在第一列中,您将毫秒加上了毫秒并检查它们是否为2000模。

vector_bool = df[df.columns[0]].apply(lambda x: x.split(".")[-1]).astype(int).cumsum().apply( lambda x: x%2000 == 0 )

Then take only the row wich are true. 然后仅取至真的那一行。

df_clean = df[vector_bool]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM