[英]read a text file which has key value pairs and convert each line as one dictionary using python pandas
I have a text file (one.txt) that contains an arbitrary number of key‐value pairs (where the key and value are separated by a =
– eg 1=8
). 我有一个文本文件(one.txt),其中包含任意数量的键/值对(键和值之间用a =
–例如1=8
分隔)。 Here are some examples: 这里有些例子:
1=88|11=1438|15=KKK|45=00|45=00|21=66|86=a
4=13|11=1438|49=DDD|8=157.73|67=00|45=00|84=b|86=a
6=84|41=18|56=TTT|67=00|4=13|45=00|07=d
I need to create a DataFrame with a list of dictionaries, with each row as one dictionary in the list like so: 我需要创建一个带有字典列表的DataFrame,并将每一行作为列表中的一个字典,如下所示:
[{1:88,11:1438,15:kkk,45:7.7....},{4:13,11:1438....},{6:84,41:18,56:TTT...}]
df = pd.read_csv("input.txt",names=['text'],header=None)
data = df['text'].str.split("|")
names=[ y.split('=') for x in data for y in x]
ds=pd.DataFrame(names)
print ds
How can I create a dictionary for each line by splitting on the =
symbol? 如何通过分割=
符号为每行创建一个词典?
It should be one row and multiple columns. 它应该是一行和多列。 The DataFrame should have all keys as rows and values as columns. DataFrame应将所有键作为行,将值作为列。
Example: 例:
1 11 15 45 21 86 4 49 8 67 84 6 41 56 45 07
88 1438 kkk 00 66 a
na 1438 na .....
I think performing a .pivot
would work. 我认为执行.pivot
会起作用。 Try this: 尝试这个:
import pandas as pd
df = pd.read_csv("input.txt",names=['text'],header=None)
data = df['text'].str.split("|")
names=[ y.split('=') for x in data for y in x]
ds=pd.DataFrame(names)
ds = ds.pivot(columns=0).fillna('')
The .fillna('')
removes the None
values. .fillna('')
删除None
值。 If you'd like to replace with na
you can use .fillna('na')
. 如果要替换为na
,可以使用.fillna('na')
。
Output: 输出:
ds.head()
1
0 07 1 11 15 21 4 41 45 49 56 6 67 8 84 86
0 88
1 1438
2 KKK
3 00
4 00
For space I didn't print the entire dataframe, but it does column indexing based on the key and then values based on the values for each line (preserving the dict by line concept). 对于空间,我没有打印整个数据框,但它会根据键进行列索引,然后根据每行的值进行值索引(保留逐行字典)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.