[英]Python Multiple Keys and Transform to Dictionary
I am importing a txt file in Python 2.6.6, and need to do some data wrangling.我在 Python 2.6.6 中导入了一个 txt 文件,需要做一些数据整理。 I am new to Python and am struggling to google every step to complete the task.我是 Python 的新手,正在努力用谷歌搜索完成任务的每一步。 Could you help or suggest?你能帮忙或建议吗?
Here is my input myData.txt shown below.这是我的输入myData.txt,如下所示。 The header is not in the data, but I put it here for easier reading.标题不在数据中,但我把它放在这里以便于阅读。
key1|key2|group|v1|v2|v3|v4
1|A|-1|10|100|1|2
1|A|2|20|35|2|3
1|B|1|15|5|3|5
2|B|5|23|25|4|2
2|B|2|33|20|22|98
2|D|4|23|21|20|32
...
Here is my desired output in a panda dataframe shown below.这是我在如下所示的熊猫数据框中所需的输出。 Basically, I want to merge key1 and key2 and form a combo key, and put group, v1, and v2 into a dictionary with group as the key, and v1 v2 as the values in a list (v1 being the first element, and v2 being the second element).基本上,我想合并 key1 和 key2 并形成一个组合键,然后将 group、v1 和 v2 放入一个字典中,以 group 作为键,v1 v2 作为列表中的值(v1 是第一个元素,v2是第二个元素)。 I don't need v3 or v4 in the output.我不需要输出中的 v3 或 v4。
comboKey1 new_v1
1_A {"-1":[10,100], "2":[20,35]}
1_B {"1":[15,5]}
2_B {"2":[33,20], "5":[23,25]}
2_D {"4":[23,21]}
Here is what I have now.这是我现在所拥有的。 Could someone kindly advise?有人可以建议吗?
import pandas as pd
df1 = pd.read_csv('myData.txt', header=None, sep='|')
df1.columns = ('key1','key2','group','v1','v2')
df1['comboKey1'] = df1['key1'].map(str)+"_"+df1['key2']
import pandas as pd
# Reading file, 'r' -> read
file = open('data.txt', 'r')
lines = file.readlines()
# Fict where info will be stored
main_dict = {}
for line in lines:
# Getting the list of values in the line
# values -> [key1, key2, group, v1, v2, v3, v4]
# indexs -> 0 1 2 3 4 5 6
values = line.split('|')
#creating combo_key
combo_key = str(values[0])+"_"+str(values[1])
#tests if key already exists
#if not, creats a new dict into it
if combo_key not in main_dict.keys():
main_dict[combo_key] = {} #adding new dict to dict key
main_dict[combo_key][str(values[2])] = [values[3], values[4]]
data = []
for key in main_dict.keys():
data.append([key, str(main_dict[key])])
df = pd.DataFrame(data, columns = ['ComboKey1', "new_v1"])
print(df)
Just sort the dict, then (:只需对 dict 进行排序,然后 (:
If just achieve the desired expected output, then the following code can apply too.如果只是达到所需的预期输出,那么下面的代码也可以应用。
import pandas as pd
from io import StringIO
YOUR_TXT_DATA = """\
1|A|-1|10|100|1|2
1|A|2|20|35|2|3
1|B|1|15|5|3|5
2|B|5|23|25|4|2
2|B|2|33|20|22|98
2|D|4|23|21|20|32
"""
df = pd.read_csv(StringIO(YOUR_TXT_DATA), header=None,
usecols=[_ for _ in range(0, 5)],
names=['key1', 'key2', 'group', 'v1', 'v2'],
sep='|')
result_dict = dict(comboKey1=[], new_v1=[])
for key1, key2, group, v1, v2 in df.values:
key = str(key1) + '_' + str(key2)
if key not in result_dict['comboKey1']:
result_dict['comboKey1'].append(key)
result_dict['new_v1'].append({str(group): [v1, v2]})
else:
index = result_dict['comboKey1'].index(key)
result_dict['new_v1'][index].update({str(group): [v1, v2]})
result_df = pd.DataFrame.from_dict(result_dict)
print(result_df)
output输出
comboKey1 new_v1
0 1_A {'-1': [10, 100], '2': [20, 35]}
1 1_B {'1': [15, 5]}
2 2_B {'5': [23, 25], '2': [33, 20]}
3 2_D {'4': [23, 21]}
I think there is some special cases that you may need to consider, assuming the data is as follows.我认为有一些特殊情况您可能需要考虑,假设数据如下。
key1|key2|group|v1|v2|v3|v4
1|A|-1|10|100|1|2
1|A|-1|10|100|1|2
1|A|-1|20|35|2|3
what is your expected output?你的预期输出是什么? (case 1 ~ 3) (案例 1 ~ 3)
1_A {'-1': [20, 35]}
(solution: dict) 1_A {'-1': [20, 35]}
(解: dict){('-1', (10, 100)), ('-1', (20, 35))}
(solution: set)情况 2:保留所有但不重复: {('-1', (10, 100)), ('-1', (20, 35))}
(解决方案:设置)1_A [('-1', (10, 100)), ('-1', (10, 100)), ('-1', (20, 35))]
(solution: list)情况 3:保留所有1_A [('-1', (10, 100)), ('-1', (10, 100)), ('-1', (20, 35))]
(解决方案:list )code:代码:
from unittest import TestCase
import pandas as pd
from io import StringIO
OTHER_TXT_DATA = """\
1|A|-1|10|100|1|2
1|A|-1|10|100|1|2
1|A|-1|20|35|2|3
"""
class MyTests(TestCase):
def __init__(self, *args, **options):
super().__init__(*args, **options)
self.df = pd.read_csv(StringIO(OTHER_TXT_DATA), header=None,
usecols=[_ for _ in range(0, 5)],
names=['key1', 'key2', 'group', 'v1', 'v2'],
sep='|')
def setUp(self) -> None:
# init on every test case.
self.result_dict = dict(comboKey1=[], new_v1=[])
def solution_base(self, new_v1_fun, update_v1_fun) -> pd.DataFrame:
result_dict = self.result_dict
for key1, key2, group, v1, v2 in self.df.values:
key = str(key1) + '_' + str(key2)
if key not in result_dict['comboKey1']:
result_dict['comboKey1'].append(key)
new_v1_fun(group, v1, v2) # result_dict['new_v1'].append({str(group): [v1, v2]})
else:
index = result_dict['comboKey1'].index(key)
update_v1_fun(index, group, v1, v2) # result_dict['new_v1'][index].update({str(group): [v1, v2]})
df = pd.DataFrame.from_dict(result_dict)
print(df)
return df
def test_case_1_dict(self):
df = self.solution_base(new_v1_fun=lambda group, v1, v2: self.result_dict['new_v1'].append({str(group): [v1, v2]}),
update_v1_fun=lambda index, group, v1, v2: self.result_dict['new_v1'][index].update({str(group): [v1, v2]}))
self.assertTrue(df.equals(pd.DataFrame(
columns=['comboKey1', 'new_v1'],
data=[
['1_A', {'-1': [20, 35]}],
]
)))
def test_case_2_set(self):
df = self.solution_base(new_v1_fun=lambda group, v1, v2: self.result_dict['new_v1'].append({(str(group), (v1, v2))}),
update_v1_fun=lambda index, group, v1, v2: self.result_dict['new_v1'][index].add((str(group), (v1, v2))))
self.assertTrue(df.equals(pd.DataFrame(
columns=['comboKey1', 'new_v1'],
data=[
['1_A', {('-1', (20, 35)), ('-1', (10, 100))}],
]
)))
def test_case_3_list(self):
df = self.solution_base(new_v1_fun=lambda group, v1, v2: self.result_dict['new_v1'].append([(str(group), (v1, v2))]),
update_v1_fun=lambda index, group, v1, v2: self.result_dict['new_v1'][index].append((str(group), (v1, v2))))
self.assertTrue(df.equals(pd.DataFrame(
columns=['comboKey1', 'new_v1'],
data=[
['1_A', [('-1', (10, 100)), ('-1', (10, 100)), ('-1', (20, 35))]],
]
)))
note: annotation (see PEP484 ) is not supported Python 2.注意: Python 2 不支持注释(参见PEP484 )。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.